home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 242,358 of 243,242   
   Richard Harnden to Michael Sanders   
   Re: is_binary_file()   
   07 Dec 25 19:01:02   
   
   From: richard.nospam@gmail.invalid   
      
   On 06/12/2025 01:05, Michael Sanders wrote:   
   > Am I close? Missing anything you'd consider to be (or not) needed?   
      
   A text file is supposed to end with a '\n' (M$, of course, largely   
   ignores this convention), but a quick test could be:   
      
            f = fopen(path, "rb");   
      
            fseek(f, -1, SEEK_END);   
      
            if ( (c = fgetc(f)) == '\n' )   
                printf("Text\n");   
            else   
                printf("Binary\n");   
      
            fclose(f);   
      
   Be aware of false positives/negatives, because I'm sure there will be   
   plenty :)   
      
      
   >   
   >    
   >   
   > /*   
   >   * Checks if a file is likely a binary by examining its content   
   >   * for NULL bytes (0x00) or unusual control characters.   
   >   * Returns 0 if text, 1 if binary or file open failure.   
   >   */   
   >   
   > int is_binary_file(const char *path) {   
   >      FILE *f = fopen(path, "rb");   
   >      if (!f) return 1; // cannot open file, treat as error/fail check   
   >   
   >      unsigned char buf[65536];   
   >      size_t n, i;   
   >   
   >      while ((n = fread(buf, 1, sizeof(buf), f)) > 0) {   
   >          for (i = 0; i < n; i++) {   
   >              unsigned char c = buf[i];   
   >   
   >              // 1. check for the NULL byte (strong indicator of binary data)   
   >              if (c == 0x00) {   
   >                  fclose(f);   
   >                  return 1; // IS binary   
   >              }   
   >   
   >              // 2. check for C0 control codes (0x01-0x1F), excluding known   
   >              // text formatting characters: 0x09 (Tab), 0x0A (LF), 0x0D (CR)   
   >              if (c < 0x20) {   
   >                  if (c != 0x09 && c != 0x0A && c != 0x0D) {   
   >                      fclose(f);   
   >                      return 1; // IS binary (contains unexpected control   
   code)   
   >                  }   
   >              }   
   >          }   
   >      }   
   >   
   >      fclose(f);   
   >      return 0; // NOT binary   
   > }   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca