... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.forth
Forth programmers eat a lot of Bratwurst
117,927 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 117,741 of 117,927
Anton Ertl to Paul Rubin
Re: Idiomatic way to read a word of text
19 Nov 25 07:27:22
   From: anton@mips.complang.tuwien.ac.at   
      
   Paul Rubin  writes:   
   >anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:   
   >> Why not?  For something like Roff (or TeX or Markdown etc.) the whole   
   >> input file easily fits into RAM, so a line would fit, too.  The   
   >> question is if the Forth system supports long lines in REFILL.   
   >   
   >The target processor might not have that much ram.   
      
   "Might"?  If you have a concrete target with small RAM (say, one of   
   the Mecrisp targets), it will come with additional restrictions, but   
   maybe also additional capabilities (like accessing the input file   
   directly in its flash storage), and all of this might change how you   
   approach the problem.   
      
   >I was thinking of accomodating modern wysiwyg editors which   
   >don't have line breaks except at the end of paragraphs.  Maybe that's   
   >not worthwhile.   
      
   If you have a machine where you run a WYSIWYG editor, you also have   
   enough RAM for keeping one line (and probably the whole text).  The   
   systems with small RAM memory tend to have only a line editor (with   
   80-char lines), or maybe a screen editor (with 1KB screens).   
      
   >One obvious approach is to use READ-LINE, but this unfortunately seems   
   >to throw away the newline at the end of the line read, so it's hard to   
   >tell if a complete line has been read, or if the buffer has simply   
   >gotten full.   
      
    says:   
   |When u1 = u2 the line terminator has yet to be reached.   
      
   >Testing with gforth, if the buffer size is exactly the   
   >line length, then FILE-POSITION points to just after the line, and the   
   >next call to READ-LINE returns 0 chars.  No idea about other Forths.   
      
   With u1=line length, the only way to satisfy the requirement above is   
   to deliver it as two parts, one with u2=u1, the other with u2=0.   
   READ-LINE and the case where u1=line length have been discussed   
   several times, so apparently it's not so clear to some how a system   
   should behave, so you may want to check the system you use, and report   
   a bug to the system implementor if it does not behave correctly.   
      
   As for FILE-POSITION, what I see in Gforth is (output after "\"):   
      
   s" /tmp/long-lines.4th" r/o open-file throw constant f \  ok   
   pad 74 f read-line throw . . \ -1 74  ok   
   f file-position throw ud. \ 74  ok   
   pad 70 f read-line throw . . \ -1 0  ok   
   f file-position throw ud. \ 75  ok   
      
   That's with a file with one-byte newlines.   
      
   >> And here's the signficance of REFILLing.  You could pass everything to   
   >> the text interpreter, and install the following recognizer sequence:   
   >> First one that recognizes things like ".i\n", and second one that   
   >> recognizes everything and then processes the line as ordinary words.   
   >   
   >I'll see if I can figure out how to do that, though the target Forth   
   >might not have recognizers.  What I wanted is a loop like   
   >  LOOP   
   >    READ a line;   
   >    IF line begins with ".", then pass the line to the text interpreter;   
   >    ELSE loop through the words on the line, copying them to the output   
   >       buffer or maybe to the output device   
   >  END LOOP   
      
   If you rely on REFILL (but then you have to INCLUDE the file, and have   
   an executable word at its start), you could implement that as:   
      
   : process-line ( -- )   
     begin   
       source nip >in @ u> while   
         parse-name type \ or whatever you want to do with words   
     repeat ;   
      
   : roff ( -- ) \ untested   
     begin   
       refill while   
         source if   
           c@ '.' = if   
             source evaluate [ 0 cs-pick ] again then   
         else   
           drop then   
         process-line   
     repeat ;   
      
   The alternative with the recognizers avoids the need to say ROFF at   
   the start of the file.  Or you have a Forth system that implements   
   EXECUTE-PARSING-FILE.   
      
   Alternatively, you could go for using READ-LINE.  In that case I would   
   treat too-long lines as errors, and the result would look as follows:   
      
   80 constant line-length \ however long lines you have space to process   
   create line line-length 2 + allot   
      
   : process-line ( c-addr u -- )   
     ... \ without PARSE-NAME support unless you use EXECUTE-PARSING   
     ;   
      
   : roff {: file-id -- :}   
     begin   
       line line-length file-id read-line throw while   
         dup line-length >= abort" line too long"   
         dup if   
           line c@ '.' = if   
             line swap evaluate [ 0 cs-pick ] again then   
         then   
         line swap process-line   
     repeat ;   
      
   >Getting words from the line should preferably use Forth's built-in   
   >parser.   
      
   That means going through INCLUDE, EVALUATE, EXECUTE-PARSING, or   
   EXECUTE-PARSING-FILE at some point.   
      
   - anton   
   --   
   M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html   
   comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html   
        New standard: https://forth-standard.org/   
   EuroForth 2025 CFP: http://www.euroforth.org/ef25/cfp.html   
   EuroForth 2025 registration: https://euro.theforth.net/   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]