Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.programming    |    Programming issues that transcend langua    |    57,431 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 56,946 of 57,431    |
|    Stefan Ram to All    |
|    Scanning    |
|    19 Jan 23 12:10:36    |
      From: ram@zedat.fu-berlin.de               Some idle thoughts about scanning (lexical analysis, or        rather what comes before it) ...               Let's take a very simple task: This scanner for text files        has nothing more to do than to return every character,        except to strip the spaces at the end of a line.               It is a function "get_next_token" that on each call will        return the next character from a file to its client (caller),        except that spaces at the end of a line will skipped.               So we read the line and strip the spaces. (One line in        Python.)               But how do I know in advance if the line will fit into        memory?               Perhaps because of such fears, traditional scanners¹ do not        read lines or, Heaven forbid, files, but only characters!               They do not use random access with respect to the text to be        scanned, but sequential access, although things would be        easier with random access.               So how would you do it with this style of programming (never        reading the whole line into memory)?               "I read a character. If it's a space, I peek at the next        character, if that's a space, I start adding spaces to my        look-ahead buffer. If an EOL is encountered, the look-ahead        buffer is discarded. Otherwise, I have to start feeding my        client from the lookahead buffer until the lookahead buffer        is empty."               If I am concerned that a line will not fit in memory, how do        I know that the sequence of spaces at the end of a line will        fit in memory (the look-ahead buffer)? The look-ahead buffer        could be replaced by a counter. If you are paranoid, you        would use a 64-bit counter and check it for overflow!               Is it worth the effort with a look-ahead buffer and        sequential access? Should you just read a line, assuming        that a line will always fit into memory, and strip the        blanks the easy way, i.e., using random access? TIA for any        comments!               1               an example of a traditional scanner:               It only ever calls "GetCh", never "GetLine". The code could        be easier to write by reading a whole line and then just        using functions that can look at that line using random        access to get the next symbol (maybe using regular        expressions). But a traditional scanner carefully only ever        reads a single character and manages a state.              PROCEDURE GetSym;              VAR i : CARDINAL;              BEGIN        WHILE ch <= ' ' DO GetCh END;        IF ch = '/' THEN        SkipLine;        WHILE ch <= ' ' DO GetCh END        END;        IF (CAP (ch) <= 'Z') AND (CAP (ch) >= 'A') THEN        i := 0;        sym := literal;        REPEAT        IF i < IdLength THEN        id [i] := ch;        INC (i)        END;        IF ch > 'Z' THEN sym := ident END;        GetCh        ...              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca