Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.programming    |    Programming issues that transcend langua    |    57,431 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 56,951 of 57,431    |
|    Stefan Ram to Stefan Ram    |
|    Re: Scanning    |
|    19 Jan 23 14:48:29    |
   
   From: ram@zedat.fu-berlin.de   
      
   ram@zedat.fu-berlin.de (Stefan Ram) writes:   
   >Let's take a very simple task: This scanner for text files   
   >has nothing more to do than to return every character,   
   >except to strip the spaces at the end of a line.   
      
    Richard said that it matters what I need this for.   
      
    I'd like to implement a tiny markup language similar   
    to languages like "Markdown" or "reStructuredText".   
    It should ignore spaces at the end of lines.   
    I'm going to implement it in Python.   
      
    Here is a first draft of a scanner that strips   
    spaces at the end of lines. It works by reading   
    single characters from the source.   
      
    For demonstration purposes, I have written spaces   
    as underlines "_".   
      
    The demo takes   
      
   Howdy___\nthere!   
      
    as input and outputs   
      
   Howdy\nthere!\n   
      
    . (It also tries to insert '\n' at the end of a   
    source when there is no '\n' at the end.)   
      
    The input text is given in the source code via   
      
   input_text = iter( 'Howdy___\nthere!' )   
      
    . What I now need to do next is to write more   
    tests in order to find errors. (I avoided using   
    classes to make the code a bit easier to read for   
    the newsgroup, but the code also will be changed   
    soon to use a class definition.)   
      
    Python 3.9   
      
    main.py   
      
   def catcode( ch ):   
    # 5 means: "this is a line terminator"   
    # 10 means: "this is a blank space"   
    # 11 means: "this is a plain character"   
    if ch == '\n': return 5   
    if ch == ' ': return 10   
    if ch == '_': return 10 # for debugging, make "_" a space   
    if ch == '\t': return 10   
    return 11   
      
   spaces_seen = [] # a buffer for spaces collected   
   char_read = '' # a buffer allowing one-character lookahead   
   previous = '' # the previous character read by "get_next_character"   
   terminated = False # set after the last character of the source was read   
      
   def get_next_character():   
    # insert EOL at the end of the last line if missing   
    global previous   
    global terminated   
    global char_read   
    if terminated: raise StopIteration   
    if char_read:   
    ch = char_read; char_read = ''   
    else:   
    try:   
    ch = next( input_text )   
    except StopIteration:   
    if previous != '' and catcode( previous )!= 5:   
    # if there is no EOL at EOF, insert one   
    ch = '\n'   
    terminated = True   
    else:   
    raise StopIteration   
    previous = ch   
    return ch   
      
   def get_next_token():   
    # skip blanks at the end of a line   
    global char_read   
    global spaces_seen   
    while True:   
    if not spaces_seen:   
    ch = get_next_character()   
    if catcode( ch )== 10:   
    spaces_seen =[ ch ]   
    while True:   
    ch = get_next_character()   
    if catcode( ch )== 10:   
    spaces_seen += ch   
    elif catcode( ch )== 5:   
    spaces_seen = []   
    return( 0, ch, 5, f'{spaces_seen=}' )   
    else:   
    char_read = ch   
    break   
    else:   
    return( 0, ch, catcode( ch ), f'{spaces_seen=}')   
    if spaces_seen:   
    ch = spaces_seen.pop( 0 )   
    return( 1, ch, catcode( ch ), f'{spaces_seen=}')   
      
   input_text = iter( 'Howdy___\nthere!' )   
      
   def main():   
    result = ''   
    while True:   
    try:   
    token = get_next_token()   
    result += token[ 1 ]   
    except StopIteration:   
    break   
    print( repr( result ))   
      
   main()   
      
    stdout   
      
   'Howdy\nthere!\n'   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca