home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.programming      Programming issues that transcend langua      57,431 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 56,979 of 57,431   
   V V V V V V V V V V V V V V V V V V to Richard Heathfield   
   Re: Scanning   
   27 Jan 23 01:46:00   
   
   From: vvvvvvvvaaaaaaaaaaaaaaa@mail.ee   
      
   You are a devil !   
      
      
      
      
   On Thursday, January 19, 2023 at 2:43:51 PM UTC+2, Richard Heathfield wrote:   
   > On 19/01/2023 12:10 pm, Stefan Ram wrote:    
   > > Some idle thoughts about scanning (lexical analysis, or    
   > > rather what comes before it) ...    
   > >    
   > > Let's take a very simple task: This scanner for text files    
   > > has nothing more to do than to return every character,    
   > > except to strip the spaces at the end of a line.    
   > >    
   > > It is a function "get_next_token" that on each call will    
   > > return the next character from a file to its client (caller),    
   > > except that spaces at the end of a line will skipped.    
   > >    
   > > So we read the line and strip the spaces. (One line in    
   > > Python.)    
   > >    
   > > But how do I know in advance if the line will fit into    
   > > memory?    
   > >    
   > > Perhaps because of such fears, traditional scanners¹ do not    
   > > read lines or, Heaven forbid, files, but only characters!    
   > >    
   > > They do not use random access with respect to the text to be    
   > > scanned, but sequential access, although things would be    
   > > easier with random access.    
   > >    
   > > So how would you do it with this style of programming (never    
   > > reading the whole line into memory)?    
   > >    
   > > "I read a character. If it's a space, I peek at the next    
   > > character, if that's a space, I start adding spaces to my    
   > > look-ahead buffer. If an EOL is encountered, the look-ahead    
   > > buffer is discarded. Otherwise, I have to start feeding my    
   > > client from the lookahead buffer until the lookahead buffer    
   > > is empty."    
   > >    
   > > If I am concerned that a line will not fit in memory, how do    
   > > I know that the sequence of spaces at the end of a line will    
   > > fit in memory (the look-ahead buffer)? The look-ahead buffer    
   > > could be replaced by a counter. If you are paranoid, you    
   > > would use a 64-bit counter and check it for overflow!    
   > >    
   > > Is it worth the effort with a look-ahead buffer and    
   > > sequential access? Should you just read a line, assuming    
   > > that a line will always fit into memory, and strip the    
   > > blanks the easy way, i.e., using random access? TIA for any    
   > > comments!    
   > >    
   > > 1    
   > >    
   > > an example of a traditional scanner:    
   > >    
   > > It only ever calls "GetCh", never "GetLine". The code could    
   > > be easier to write by reading a whole line and then just    
   > > using functions that can look at that line using random    
   > > access to get the next symbol (maybe using regular    
   > > expressions). But a traditional scanner carefully only ever    
   > > reads a single character and manages a state.    
   > >    
   > > PROCEDURE GetSym;    
   > >    
   > > VAR i : CARDINAL;    
   > >    
   > > BEGIN    
   > > WHILE ch <= ' ' DO GetCh END;    
   > > IF ch = '/' THEN    
   > > SkipLine;    
   > > WHILE ch <= ' ' DO GetCh END    
   > > END;    
   > > IF (CAP (ch) <= 'Z') AND (CAP (ch) >= 'A') THEN    
   > > i := 0;    
   > > sym := literal;    
   > > REPEAT    
   > > IF i < IdLength THEN    
   > > id [i] := ch;    
   > > INC (i)    
   > > END;    
   > > IF ch > 'Z' THEN sym := ident END;    
   > > GetCh    
   > > ...    
   >    
   > man 3 realloc    
   >    
   > This was a perennial comp.lang.c topic back in the day.    
   >    
   > My interface looked (and still looks) like this:    
   >    
   > #define FGDATA_BUFSIZ BUFSIZ /* adjust to taste */    
   > #define FGDATA_WRDSIZ sizeof("floccinaucinihilipilification")    
   > #define FGDATA_REDUCE 1    
   >    
   > int fgetline(char **line, size_t *size, size_t maxrecsize, FILE    
   > *fp, unsigned int flags, size_t *plen);    
   >    
   > It's easier to use than it might look:    
   >    
   > char *data = NULL; /* where will the data go? NULL is fine */    
   > size_t size = 0; /* how much space do we have right now? */    
   > size_t len = 0; /* after call, holds line length */    
   >    
   > while(fgetline(&data, &size, (size_t)-1, stdin, 0, &len) == 0)    
   > {    
   > if(len > 0)    
   >    
   > If you want fgetline.c and don't have 20 years of clc archives,    
   > just yell.    
   >    
   > --    
   > Richard Heathfield    
   > Email: rjh at cpax dot org dot uk    
   > "Usenet is a strange place" - dmr 29 July 1999    
   > Sig line 4 vacant - apply within   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca