... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.programming
Programming issues that transcend langua
57,431 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 56,951 of 57,431
Stefan Ram to Stefan Ram
Re: Scanning
19 Jan 23 14:48:29
   From: ram@zedat.fu-berlin.de   
      
   ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >Let's take a very simple task: This scanner for text files   
   >has nothing more to do than to return every character,   
   >except to strip the spaces at the end of a line.   
      
     Richard said that it matters what I need this for.   
      
     I'd like to implement a tiny markup language similar   
     to languages like "Markdown" or "reStructuredText".   
     It should ignore spaces at the end of lines.   
     I'm going to implement it in Python.   
      
     Here is a first draft of a scanner that strips   
     spaces at the end of lines. It works by reading   
     single characters from the source.   
      
     For demonstration purposes, I have written spaces   
     as underlines "_".   
      
     The demo takes   
      
   Howdy___\nthere!   
      
     as input and outputs   
      
   Howdy\nthere!\n   
      
     . (It also tries to insert '\n' at the end of a   
     source when there is no '\n' at the end.)   
      
     The input text is given in the source code via   
      
   input_text = iter( 'Howdy___\nthere!' )   
      
     . What I now need to do next is to write more   
     tests in order to find errors. (I avoided using   
     classes to make the code a bit easier to read for   
     the newsgroup, but the code also will be changed   
     soon to use a class definition.)   
      
     Python 3.9   
      
     main.py   
      
   def catcode( ch ):   
       # 5 means: "this is a line terminator"   
       # 10 means: "this is a blank space"   
       # 11 means: "this is a plain character"   
       if ch == '\n': return 5   
       if ch == ' ': return 10   
       if ch == '_': return 10 # for debugging, make "_" a space   
       if ch == '\t': return 10   
       return 11   
      
   spaces_seen = [] # a buffer for spaces collected   
   char_read = '' # a buffer allowing one-character lookahead   
   previous = '' # the previous character read by "get_next_character"   
   terminated = False # set after the last character of the source was read   
      
   def get_next_character():   
       # insert EOL at the end of the last line if missing   
       global previous   
       global terminated   
       global char_read   
       if terminated: raise StopIteration   
       if char_read:   
           ch = char_read; char_read = ''   
       else:   
           try:   
               ch = next( input_text )   
           except StopIteration:   
               if previous != '' and catcode( previous )!= 5:   
                   # if there is no EOL at EOF, insert one   
                   ch = '\n'   
                   terminated = True   
               else:   
                   raise StopIteration   
       previous = ch   
       return ch   
      
   def get_next_token():   
       # skip blanks at the end of a line   
       global char_read   
       global spaces_seen   
       while True:   
           if not spaces_seen:   
               ch = get_next_character()   
               if catcode( ch )== 10:   
                   spaces_seen =[ ch ]   
                   while True:   
                       ch = get_next_character()   
                       if catcode( ch )== 10:   
                           spaces_seen += ch   
                       elif catcode( ch )== 5:   
                           spaces_seen = []   
                           return( 0, ch, 5, f'{spaces_seen=}' )   
                       else:   
                           char_read = ch   
                           break   
               else:   
                   return( 0, ch, catcode( ch ), f'{spaces_seen=}')   
           if spaces_seen:   
               ch = spaces_seen.pop( 0 )   
               return( 1, ch, catcode( ch ), f'{spaces_seen=}')   
      
   input_text = iter( 'Howdy___\nthere!' )   
      
   def main():   
       result = ''   
       while True:   
           try:   
               token = get_next_token()   
               result += token[ 1 ]   
           except StopIteration:   
               break   
       print( repr( result ))   
      
   main()   
      
       stdout   
      
   'Howdy\nthere!\n'   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]