From: Keith.S.Thompson+u@gmail.com   
      
   Michael Sanders writes:   
   [...]   
      
   For yet another set of unreliable hueristics for guessing whether a file   
   is text or binary, you can take a look at Perl's built-in "-T" and "-B"   
   operators.   
      
    The "-T" and "-B" tests work as follows. The first block   
    or so of the file is examined to see if it is valid   
    UTF-8 that includes non-ASCII characters. If so, it's a   
    "-T" file. Otherwise, that same portion of the file is   
    examined for odd characters such as strange control codes   
    or characters with the high bit set. If more than a third   
    of the characters are strange, it's a "-B" file; otherwise   
    it's a "-T" file. Also, any file containing a zero byte   
    in the examined portion is considered a binary file. (If   
    executed within the scope of a use locale which includes   
    "LC_CTYPE", odd characters are anything that isn't a   
    printable nor space in the current locale.) If "-T" or   
    "-B" is used on a filehandle, the current IO buffer is   
    examined rather than the first block. Both "-T" and "-B"   
    return true on an empty file, or a file at EOF when testing   
    a filehandle. Because you have to read a file to do the "-T"   
    test, on most occasions you want to use a "-f" against the   
    file first, as in "next unless -f $file && -T $file".   
      
   It's not clear how big a "block" is. For an empty file, both -T   
   and -B are true. I don't know whether there are other cases where   
   both are true, or where both are false.   
      
   --   
   Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com   
   void Void(void) { Void(); } /* The recursive call of the void */   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|