... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 129,883 of 131,241
Anton Ertl to MitchAlsup
Re: sign/zero/garbage extension (was: Ti
10 Oct 25 12:04:52
   From: anton@mips.complang.tuwien.ac.at   
      
   MitchAlsup  writes:   
   >   
   >anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >> Concerning pain, I found that in Gforth (which contains C code and   
   >> Forth code) we had many more portability bugs in the C code than in   
   >> the Forth code, where we had almost no portability bugs.   
   >   
   >C, itself, would be3 a "lot less painful" if C only had 2 integer types   
   >1-word and 2-words. But, instead, they typical 2^(n+3) machines have   
   >8-integer types (Signed, unSigned}×{Byte, Half, Word, DBLE}, and then   
   >to make it as bad as possible, there are a myriad of types {ptr_dif,   
   >size_t, off_t, ...} that change {Sign}×{Size} on an architecture basis.   
      
   Actually, ptrdiff_t might be seen as the signed word-size integer type   
   and size_t as the unsigned one.  That's somewhat   
      
   Concerning off_t, if C had the single-word and two-word type, one   
   could have used the two-word type instead of off_t from the start,   
   avoiding the pain of _FILE_OFFSETS_BITS etc.   
      
   Concerning signedness: Forth also supports signed and unsigned cells   
   and double-cells.  This does not cause portability problems, because   
   the signedness of a value does not change between platforms.   
   Signedness bugs are easy to miss, however.   
      
   >> That's because Forth has only two integer types: cell (a machine word)   
   >> and double cell (two machine words); and if you use one instead of the   
   >> other, the code fails, whatever the cell size is.   
   >   
   >Same as  FORTRAN.   
      
   According to the information discussed here recently, FORTRAN uses the   
   same approach on byte-addressed machines as Java: 32-bit INTEGERs,   
   32-bit REALs, 64-bit DOUBLEs.  No word-sized INTEGERs in FORTRAN.   
      
   BTW, in Forth the FP sizes are not related to integer sizes; this does   
   not cause portability problems in my experience, but I have   
   experienced FP-related portability problems, typically coming from the   
   assumption that an FP value consumes a power-of-two number of bytes in   
   memory (there are systems with 10-byte floats).   
      
   >> By contrast, in the C code we have to deal with a large number of   
   >> integer types (not just int, long, etc., but also, e.g., off_t), with   
   >> the relations between the types being different on different   
   >> platforms, or, in the case of off_t, also depending #defines.  On one   
   >> machine some function parameter was a long or whatever, on a different   
   >> one it was a bla_t or whatever.  Of course, these days one might   
   >> target only Linux and MacOS and reach >99% of desktops and servers   
   >> (the result runs on Windows through WSL2), but that solves the problem   
   >                                                      ^only   
   >> by reducing the portability requirements.   
   >   
   >Blame goes to:: ISO/IEC 9899:1999 for trying to accommodate everyone   
   >and ending up screwing everyone.   
      
   I don't think that blaming anyone is useful.  One can, however, think   
   about what contributed to the portability problems and what   
   alternative approaches would have avoided them.   
      
   The machine-word-oriented B proved insufficient for the byte-addressed   
   PDP-11, so Ritchie added types and C was born.  There was int (the   
   machine word) and char (the byte).  Because in B p+1 means the next   
   machine word after p, and Ritchie wanted to preserve this, C also has   
   typed pointers: int * and char *.  long was added because int is   
   occasionally too small on the PDP-11.   
      
   One way to avoid the portability problems would have been to define   
   int and pointers to be a machine words and long to be two machine   
   words.  In this scenario, as long as machine-internal data is   
   accessed, there would not be portability problems: pid_t, uid_t,   
   etc. would all be ints.  There would be problems when exchanging data   
   with outher machines.  E.g., a file system probably wants   
   architecture-independent data, and would spend, say, 32 bits on the   
   uid.  But at least these issues would be limited to the code that   
   accesses these file systems (at least if the programmer isolates these   
   accesses).   
      
   But C did not go there, and instead made long 32 bits long on both   
   16-bit machines and on 32-bit machines, with the result that lseek(),   
   which produced and consumed a long, could only deal with 2GB files.   
   Good enough at the start, but limiting later, so at some point off_t   
   and the whole _FILE_OFFSET_BITS mess had to be introduced.   
      
   Another way to avoid the portability problems would have been to go   
   for special-purpose types like off_t from the start and make all   
   integer types incompatible, i.e., require explicit instead of implicit   
   conversion between them.  That (along with appropriate teaching   
   material) would make it clear that conversion should be avoided where   
   possible, which in turn would reduce the dependencies on relations   
   between type sizes.  However, going full-bore in this direction when   
   coming from B was probably incompatible with Ritchie's apparent goal   
   of using B code with as few changes as possible.   
      
   - anton   
   --   
   'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'   
     Mitch Alsup,    
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]