home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.programming      Programming issues that transcend langua      57,431 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 56,947 of 57,431   
   Richard Heathfield to Stefan Ram   
   Re: Scanning   
   19 Jan 23 12:43:45   
   
   From: rjh@cpax.org.uk   
      
   On 19/01/2023 12:10 pm, Stefan Ram wrote:   
   >    Some idle thoughts about scanning (lexical analysis, or   
   >    rather what comes before it) ...   
   >   
   >    Let's take a very simple task: This scanner for text files   
   >    has nothing more to do than to return every character,   
   >    except to strip the spaces at the end of a line.   
   >   
   >    It is a function "get_next_token" that on each call will   
   >    return the next character from a file to its client (caller),   
   >    except that spaces at the end of a line will skipped.   
   >   
   >    So we read the line and strip the spaces. (One line in   
   >    Python.)   
   >   
   >    But how do I know in advance if the line will fit into   
   >    memory?   
   >   
   >    Perhaps because of such fears, traditional scanners¹ do not   
   >    read lines or, Heaven forbid, files, but only characters!   
   >   
   >    They do not use random access with respect to the text to be   
   >    scanned, but sequential access, although things would be   
   >    easier with random access.   
   >   
   >    So how would you do it with this style of programming (never   
   >    reading the whole line into memory)?   
   >   
   >    "I read a character. If it's a space, I peek at the next   
   >    character, if that's a space, I start adding spaces to my   
   >    look-ahead buffer. If an EOL is encountered, the look-ahead   
   >    buffer is discarded. Otherwise, I have to start feeding my   
   >    client from the lookahead buffer until the lookahead buffer   
   >    is empty."   
   >   
   >    If I am concerned that a line will not fit in memory, how do   
   >    I know that the sequence of spaces at the end of a line will   
   >    fit in memory (the look-ahead buffer)? The look-ahead buffer   
   >    could be replaced by a counter. If you are paranoid, you   
   >    would use a 64-bit counter and check it for overflow!   
   >   
   >    Is it worth the effort with a look-ahead buffer and   
   >    sequential access? Should you just read a line, assuming   
   >    that a line will always fit into memory, and strip the   
   >    blanks the easy way, i.e., using random access? TIA for any   
   >    comments!   
   >   
   >    1   
   >   
   >    an example of a traditional scanner:   
   >   
   >    It only ever calls "GetCh", never "GetLine". The code could   
   >    be easier to write by reading a whole line and then just   
   >    using functions that can look at that line using random   
   >    access to get the next symbol (maybe using regular   
   >    expressions). But a traditional scanner carefully only ever   
   >    reads a single character and manages a state.   
   >   
   > PROCEDURE GetSym;   
   >   
   > VAR     i          : CARDINAL;   
   >   
   > BEGIN   
   >    WHILE  ch <= ' '  DO  GetCh  END;   
   >    IF  ch = '/'  THEN   
   >      SkipLine;   
   >      WHILE  ch <= ' '  DO  GetCh  END   
   >    END;   
   >    IF  (CAP (ch) <= 'Z') AND (CAP (ch) >= 'A')  THEN   
   >      i := 0;   
   >      sym := literal;   
   >      REPEAT   
   >        IF  i < IdLength  THEN   
   >          id [i] := ch;   
   >          INC (i)   
   >        END;   
   >        IF  ch > 'Z' THEN  sym := ident  END;   
   >        GetCh   
   >        ...   
      
   man 3 realloc   
      
   This was a perennial comp.lang.c topic back in the day.   
      
   My interface looked (and still looks) like this:   
      
   #define FGDATA_BUFSIZ BUFSIZ /* adjust to taste */   
   #define FGDATA_WRDSIZ sizeof("floccinaucinihilipilification")   
   #define FGDATA_REDUCE  1   
      
   int fgetline(char **line, size_t *size, size_t maxrecsize, FILE   
   *fp, unsigned int flags, size_t *plen);   
      
   It's easier to use than it might look:   
      
      char *data = NULL; /* where will the data go? NULL is fine */   
      size_t size = 0;   /* how much space do we have right now? */   
      size_t len = 0;    /* after call, holds line length */   
      
      while(fgetline(&data, &size, (size_t)-1, stdin, 0, &len) == 0)   
      {   
        if(len > 0)   
      
   If you want fgetline.c and don't have 20 years of clc archives,   
   just yell.   
      
   --   
   Richard Heathfield   
   Email: rjh at cpax dot org dot uk   
   "Usenet is a strange place" - dmr 29 July 1999   
   Sig line 4 vacant - apply within   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca