home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.c      Meh, in C you gotta define EVERYTHING      243,242 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 242,411 of 243,242   
   Richard Heathfield to Scott Lurndal   
   Re: is_binary_file()   
   10 Dec 25 19:42:24   
   
   From: rjh@cpax.org.uk   
      
   On 10/12/2025 17:18, Scott Lurndal wrote:   
   > Michael S  writes:   
   >> On Wed, 10 Dec 2025 15:07:30 GMT   
   >> scott@slp53.sl.home (Scott Lurndal) wrote:   
   >>   
   >>> Michael Sanders  writes:   
   >>>> On Sat, 6 Dec 2025 02:00:22 -0000 (UTC), Lew Pitcher wrote:   
   >>>>   
   >>>>> I should have added that I feel that you probably haven't really   
   >>>>> defined /what/ "text file" means, and that has interfered with   
   >>>>> the development of this function. As Keith pointed out, the task   
   >>>>> of distinguishing between a "text" file and a "binary" file is not   
   >>>>> easy. I'll add that a lot of the difficulty stems from the fact   
   >>>>> that there are many definitions (some conflicting) of what a "text"   
   >>>>> file actually contains.   
   >>>>   
   >>>> Yes. Here's my 2nd attempt following the template (of thinking)   
   >>>> you've suggested...   
   >>>   
   >>> The problem with all of your attempts is the performance   
   >>> issue.  Success requires reading every single byte of the   
   >>> file, one byte at a time.   The word 'slow' is not sufficient   
   >>> to describe how bad the performance will be for a very large   
   >>> file.   
   >>>   
   >>> At a minimum, dump the stdio double-buffered byte-by-byte   
   >>> algorithm and use mmap().   
   >>>   
   >>   
   >> I suggest to do actual speed measurements before making bold   
   >> claims like above. Don't trust your intuition!   
   >   
   > I have, more than once, done such measurements after mmap()   
   > was introduced in SVR4 circa 1989 (ported from SunOS).   
   >   
   > On a single-user system, running a single job, the difference   
   > for smaller files is in the noise.   For larger files, or when   
   > the system is heavily loaded or multiuser, it can be significant.   
      
   1989 is 36 years ago. Technology has moved on. If reading your   
   file is too slow to read, get yourself a real computer.   
      
   On my very ordinary desktop machine, I just freq'd[1] a   
   7,032,963,565-byte file in 12.256 seconds. That's 573,838,410   
   bytes per second. It's a damn sight faster than I could do by hand.   
      
   How, exactly, are you using `slow'?   
      
      
   [1] Nothing fancy; a getc loop with ++pfm[ch].count written   
   entirely in what used to be called clc-conforming code, and I can   
   see at least one egregious inefficiency in the code that I can't   
   be bothered to fix because half a gig a second is *easily* fast   
   enough for my needs.   
      
   --   
   Richard Heathfield   
   Email: rjh at cpax dot org dot uk   
   "Usenet is a strange place" - dmr 29 July 1999   
   Sig line 4 vacant - apply within   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca