... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.forth
Forth programmers eat a lot of Bratwurst
117,927 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 117,582 of 117,927
Ruvim to Anton Ertl
Re: 0 vs. translate-none
26 Sep 25 02:19:36
   From: ruvim.pinka@gmail.com   
      
   On 2025-09-17 20:53, Anton Ertl wrote:   
   > This posting is a more general reflection about designing types in   
   > Forth; it just uses recognizers as example.   
   >   
   > The original proposal for recognizers had R:FAIL as the result of a   
   > recognizer that did not recognize the input.  Later that was renamed   
   > to NOTFOUND; then there was a proposal where 0 would be used instead,   
   > and Bernd Paysan changed all the uses of NOTFOUND in Gforth to 0.   
   > Finally, on last Thursday the committee decided to go with   
   > TRANSLATE-NONE for that result.   
   >   
   > Bernd Paysan thought that it would be easy to change back to a non-0   
   > value for TRANSLATE-NONE, by looking at the patch that changed   
   > NOTFOUND to 0.  However, in the meantime there has been more work   
   > done, so it's not so easy.   
   >   
   > E.g., there was a word   
   >   
   > ?FOUND ( x -- x )   
   >   
   > that would throw -13 if x=0.  This word was used both with the result   
   > of recognizers and with nt|0 or xt|0.  Fortunately, in this case the   
   > cases were easy to recognize, and they are now addressed by two words:   
   > ?REC-FOUND (for recognizer results) and ?FOUND (for x|0).   
      
   A better name than `?rec-found` is `?recognized`.   
      
   Given the pattern "rec-something ( sd -- qt|0 )", the pattern   
   "?rec-something ( sd -- qt )" should be for words that accept a string   
   and throw an exception if it is not recognized as "something".   
      
      
      
   > What do we learn from this?  Merging two previously separate types   
   > such that they are dealt with (partly) the same words (e.g., 0= in   
   > this case) is easy, as is mixing two kinds of sand.  Separating two   
   > previously (partly) merged types to use type-specific words is a lot   
   > more work.   
      
   Yes. But this work is not justified in any way.   
      
      
   I see the problem a little differently — in terms of subtyping and type   
   hierarchies.   
      
   If a type B is a subtype of a type A, than all words that accept any   
   member of A, also accept any member of B.   
      
   So when introducing a new type C, the first challenge is to optimally   
   choose the nearest supertype (or supertypes) for it.   
      
   For example, if you make C a subtype of A, than all methods of A apply   
   to C. If you make C a subtype of B, all methods of A and B apply to C.   
      
   When choosing a supertype, the factors for consideration are:   
      - consistency with existing types and methods;   
      - minimizing the lexical code size of programs;   
      - applying existing techniques and methods to the new types;   
      - restrictions on implementations;   
      
   We generally don't plan for future changes to subtype relationships.   
   Yes, they can be changed during the design and experimentation phase,   
   but that doesn't constitute an argument for choosing one supertype over   
   another.   
      
   Obviously, the more general a supertype is, the more implementation   
   options are available and the fewer existing methods can be applied to   
   members of the type.  However, this dependence alone is also not an   
   argument for choosing one supertype over another.   
      
      
      
      
      
   Returning to recognizers.   
      
   There is a quite general type: ( i*x x\0 ). Let's call it "any-nz".   
      
   The unique feature of this type is that there is a simple and general   
   method to check whether a data object is a member of this type — just   
   check whether the top single-cell value is a non-zero. And this method   
   applies to *any* subtype of this type.  This method is made even more   
   elegant by the fact that control flow operators apply it automatically.   
      
   Note that nt, xt, wid are subtypes of any-nz.   
      
   Another side of any-nz is that a union type ( any-nz | 0 ) is a natively   
   discriminated union. This has led to a common approach of returning   
   any-nz on success and 0 on failure.   
      
   The question is: should the recognizers follow this approach? I think   
   so.  This effectively means that a type of a success result of a   
   recognizer is a subtype of any-nz, and a type of a failure result is a   
   subtype of the unit type "0".   
      
      
   The only counterargument is that 0 on failure is too restrictive for   
   implementations.   
      
   This does not seem convincing. Because `search-wordlist`, `find`,   
   `find-name`, `find-name-in` return 0 on failure and this is not too   
   restrictive for implementations.   
      
   OTOH, why in this case we should prefer the convenience of   
   implementations over the convenience of programs?   
      
      
      
      
      
      
   > You can fake it by defining 0 CONSTANT TRANSLATE-NONE, but then you   
   > never know if your code ports to other systems where TRANSLATE-NONE is   
   > non-zero.  For now Gforth does it this way, but I don't expect that to   
   > be the final stage.   
   >   
   > Should we prefer to separate types or merge them?   
      
      
   In other words, should we restrict implementation options in this   
   regard?  Yes, because this is a common approach, which makes programs   
   simpler.   
      
      
      
   > Both approaches have advantages:   
   >   
   > * With separate words for dealing with the types, we can easily find   
   >    all uses of that type and do something about it.  E.g., a while ago   
   >    I changed the cs-item (control-flow stack item) in Gforth from three   
   >    to four cells.  This was relatively easy because there are only a   
   >    few words in Gforth that deal with cs items.   
      
      
   The cs-item example does not demonstrate any advantages because the   
   formal type didn't change. You only needed to find the places where the   
   system-specific subtype was used by system-specific methods. Places   
   where the formal type was used didn't change.   
      
      
      
      
   > * With a merged approach, we can use the same words for dealing with   
   >    several types, with further words building upon these words (instead   
   >    of having to define the further words n times for n types).  But   
   >    that makes the separation problem even harder.   
      
   A separation (i.e., breaking a subtyping relationship) should not be   
   planned at all.   
      
      
   > Overall, I think that the merged approach is preferable, but only if   
   > you are sure that you will never need to separate the types (whether   
   > due to a committee decision or because some new requirement means that   
   > you have to change the representation of the type).   
      
      
   If an old data type will not fit the new requirements in the future, the   
   new type (and new methods) should be introduced. Changing existing   
   subtypes of an old type cannot be planned in principle.   
      
      
      
      
   --   
   Ruvim   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]