... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 129,409 of 131,241
David Brown to Dan Cross
Re: System calls (2/3)
15 Aug 25 17:49:53
   [continued from previous message]   
      
   well established long before C was conceived.  And programmers of that   
   time were familiar with the concept of functions and operations being   
   defined for appropriate inputs, and having no defined behaviour for   
   invalid inputs.  C is full of other things where behaviour is left   
   undefined when no sensible correct answer can be specified, and that is   
   not just because the behaviour of different hardware could vary.  It   
   seems perfectly reasonable to me to suppose that signed integer   
   arithmetic overflow is just another case, no different from   
   dereferencing an invalid pointer, dividing by zero, or any one of the   
   other UB's in the standards.   
      
   >   
   > In particular, C is a programming language for actual machines,   
   > not a mathematical notation; the language is free to define the   
   > behavior of arithmetic expressions in any way it chooses, though   
   > one presumes it would do so in a way that makes sense for the   
   > machines that it targets.   
      
   Yes, that is true.  It is, however, also important to remember that it   
   was based on a general abstract machine, not any particular hardware,   
   and that the operations were intended to follow standard mathematics as   
   well as practically possible - operations and expressions in C were not   
   designed for any particular hardware.  (Though some design choices were   
   biased by particular hardware.)   
      
   > Thus, it could have formalized the   
   > result of signed integer overflow to follow 2's complement   
   > semantics had the committee so chosen, in which case the result   
   > would not be "incorrect", it would be well-defined with respect   
   > to the semantics of the language.  Java, for example, does this,   
   > as does C11 (and later) atomic integer operations.  Indeed, the   
   > C99 rationale document makes frequent reference to twos   
   > complement, where overflow and modular behavior are frequently   
   > equivalent, being the common case.  But aside from the more   
   > recent atomics support, C _chose_ not to do this.   
   >   
      
   It could have made signed integer overflow defined behaviour, but it did   
   not.  The C standards committee have explicitly chosen not to do that,   
   even after deciding that two's complement is the only supported   
   representation for signed integers in C23 onwards.  It is fine to have   
   two's complement representation, and fine to have modulo arithmetic in   
   some circumstances, while leaving other arithmetic overflow undefined.   
   Unsigned integer operations in C have always been defined as modulo   
   arithmetic - addition of unsigned values is a different operation from   
   addition of signed values.  Having some modulo behaviour does not in any   
   way imply that signed arithmetic should be modulo.   
      
   In Java, the language designers decided that integer arithmetic   
   operations would be modulo operations.  Wrapping therefore gives the   
   correct answer for those operations - it does not give the correct   
   answer for mathematical integer operations.  And Java loses common   
   mathematical identities which C retains - such as the identity that   
   adding a positive integer to another integer will increase its value.   
   Something always has to be lost when approximating unbounded   
   mathematical integers in a bounded implementation - I think C made the   
   right choices here about what to keep and what to lose, and Java made   
   the wrong choices.  (Others may of course have different opinions.)   
      
   In Zig, unsigned integer arithmetic overflow is also UB as these   
   operations are not defined as modulo.  I think that is a good natural   
   choice too - but it is useful for a language to have a way to do   
   wrapping arithmetic on the occasions you need it.   
      
   > Also, consider that _unsigned_ arithmetic is defined as having   
   > wrap-around semantics similar to modular arithmetic, and thus   
   > incapable of overflow.   
      
   Yes.  Unsigned arithmetic operations are different operations from   
   signed arithmetic operations in C.   
      
   > But that's simply a fiction invented for   
   > the abstract machine described informally in the standard: it   
   > requires special handling one machines like the 1100 series,   
   > because those machines might trap on overflow.  The C committee   
   > could just as well have said that the unsigned arithmetic   
   > _could_ overflow and that the result was UB.   
   >   
      
   They could have done that (as the Zig folk did).   
      
   > So why did C chose this way?  The only logical reason is that   
   > there were machines at the time that where a) integer overflow   
   > caused machine exceptions, and b) the representation of signed   
   > integers was not well-defined, so that the actual value   
   > resulting from overflow could not be rigorously defined.  Given   
   > that C90 mandated a binary representation for integers and so   
   > the representation of of unsigned integers is basically common,   
   > there was no need to do that for unsigned arithmetic.   
   >   
      
   Not at all.  Usually when someone says "the only logical reason is...",   
   they really mean "the only logical reason /I/ can think of is...", or   
   "the only reason that /I/ can think of that /I/ think is logical is...".   
      
   For a language that can be used as a low-level systems language, it is   
   important to be able to do modulo arithmetic efficiently.  It is needed   
   for a number of low-level tasks, including the implementation of large   
   arithmetic operations, handling timers, counters, and other bits and   
   pieces.  So it was definitely a useful thing to have in C.   
      
   For a language that can be used as a fast and efficient application   
   language, it must have a reasonable approximation to mathematical   
   integer arithmetic.  Implementations should not be forced to have   
   behaviours beyond the mathematically sensible answers - if a calculation   
   can't be done correctly, there's no point in doing it.  Giving nonsense   
   results does not help anyone - C programmers or toolchain implementers,   
   so the language should not specify any particular result.  More sensible   
   defined overflow behaviour - saturation, error values, language   
   exceptions or traps, etc., would be very inefficient on most hardware.   
   So UB is the best choice - and implementations can do something   
   different if they like.   
      
   Too many options make a language bigger - harder to implement, harder to   
   learn, harder to use.  So it makes sense to have modulo arithmetic for   
   unsigned types, and normal arithmetic for signed types.   
      
   I am not claiming to know that this is the reasoning made by the C   
   language pioneers.  But it is definitely an alternative logical reason   
   for C being the way it is.   
      
   >>> And of course, even today, C still targets oddball platforms   
   >>> like DSPs and custom chips, where assumptions about the ubiquity   
   >>> of 2's comp may not hold.   
   >>   
   >> Modern C and C++ standards have dropped support for signed integer   
   >> representation other than two's complement, because they are not in use   
   >> in any modern hardware (including any DSP's) - at least, not for   
   >> general-purpose integers.  Both committees have consistently voted to   
   >> keep overflow as UB.   
   >   
   > Yes.  As I said, performance is often the justification.   
   >   
   > I'm not convinced that there are no custom chips and/or DSPs   
   > that are not manufactured today.  They may not be common, their   
   > mere existence is certainly dumb and offensive, but that does   
   > not mean that they don't exist.  Note that the survey in, e.g.,   
   > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm   
   > only mentions _popular_ DSPs, not _all_ DSPs.   
   >   
      
   I think you might have missed a few words in that paragraph, but I   
   believe I know what you intended.  There are certainly DSPs and other   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]