home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 242,350 of 243,242   
   Lew Pitcher to Lew Pitcher   
   Re: is_binary_file()   
   06 Dec 25 02:00:22   
   
   From: lew.pitcher@digitalfreehold.ca   
      
   On Sat, 06 Dec 2025 01:41:28 +0000, Lew Pitcher wrote:   
      
   > On Sat, 06 Dec 2025 01:05:44 +0000, Michael Sanders wrote:   
   >   
   >> Am I close? Missing anything you'd consider to be (or not) needed?   
   >>   
   >>    
   >>   
   >> /*   
   >>  * Checks if a file is likely a binary by examining its content   
   >>  * for NULL bytes (0x00) or unusual control characters.   
   >>  * Returns 0 if text, 1 if binary or file open failure.   
   >>  */   
   >   
   > First off, until we get computers that store file data in formats   
   > other than binary, /all/ files (text or not) are "binary" files   
   > (meaning that an is_binary_file() function should always return true).   
   > OTOH, "text files" are a distinguishable subset of binary files.   
   > I suggest that this makes an "is_text_file()" function more valuable   
   > and more fitting than an "is_binary_file()" function.   
   >   
   > Secondly, ISTM that the function should return a unique failure value   
   > rather than overload the "is binary" return value. After all, you   
   > actually have three return values: is_text, is_not_text, and   
   > is_indeterminate (because of file access failure).   
   [snip]   
      
   I should have added that I feel that you probably haven't really   
   defined /what/ "text file" means, and that has interfered with   
   the development of this function. As Keith pointed out, the task   
   of distinguishing between a "text" file and a "binary" file is not   
   easy. I'll add that a lot of the difficulty stems from the fact   
   that there are many definitions (some conflicting) of what a "text"   
   file actually contains.   
      
   The best advice I can give here is that you should pick a definition   
   of what a text file consists of, document /that/ definition, and   
   use /that/ documentation to build your code. If you say that, for   
   instance, EBCDIC is out of scope, then your code does not have to   
   handle EBCDIC (but if you /don't/ say that, then you leave your code   
   open to the ambiguity of whether or not it will work with EBCDIC).   
   Likewise for ASCII or "Extended ASCII" (sic) or Unicode (or 6Bit   
   (multiple different choices here) or Baudot or even Morse).   
      
   With suitable definitions beforehand, you can write an acceptable   
   "is_text_file()" function and/or a passable "is_binary_file()"   
   function.   
      
   HTH   
   --   
   Lew Pitcher   
   "In Skills We Trust"   
   Not LLM output - I'm just like this.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca