... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.programming

Programming issues that transcend langua

57,431 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 56,948 of 57,431

Dmitry A. Kazakov to Stefan Ram

Re: Scanning

19 Jan 23 14:50:58

   From: mailbox@dmitry-kazakov.de   

   On 2023-01-19 13:10, Stefan Ram wrote:   

   >    But how do I know in advance if the line will fit into   
   >    memory?   

   No idea, my parser reads whole source line into the buffer.   

   >    Perhaps because of such fears, traditional scanners¹ do not   
   >    read lines or, Heaven forbid, files, but only characters!   

   I think it is more C/UNIX tradition coming from having neither proper   
   strings in the language nor lines/records in the filesystem.   

   >    So how would you do it with this style of programming (never   
   >    reading the whole line into memory)?   

   By never following this style and never using scanners, lexers,   
   tokenizers and other primitive stuff. I do all that in a single pass   
   that produces either the code or else the AST.   

   >    "I read a character. If it's a space, I peek at the next   
   >    character, if that's a space, I start adding spaces to my   
   >    look-ahead buffer. If an EOL is encountered, the look-ahead   
   >    buffer is discarded. Otherwise, I have to start feeding my   
   >    client from the lookahead buffer until the lookahead buffer   
   >    is empty."   

   Reasonable languages deploy the rule that one blank character is   
   equivalent to any number of blank characters, so you could simply pass   
   one single space further. Note that you have to annotate tokens by   
   source location anyway (another reason for ditching the scanner   
   altogether). So you do not need to care about what this blank was built   
   of. And yet another reason not to use scanner is that the blank can be a   
   part of a, possibly malformed, comment or literal.   

   >    Is it worth the effort with a look-ahead buffer and   
   >    sequential access? Should you just read a line, assuming   
   >    that a line will always fit into memory, and strip the   
   >    blanks the easy way, i.e., using random access?   

   My parser works with an abstract source object. The implementation of   
   the source object maintains an internal line buffer, which size is a   
   parameter. Whether it is set to 1TB or 1024 bytes, the parser does not care.   

   --   
   Regards,   
   Dmitry A. Kazakov   
   http://www.dmitry-kazakov.de   

   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]