From: cross@spitfire.i.gajendra.net   
      
   In article <2025Aug13.232334@mips.complang.tuwien.ac.at>,   
   Anton Ertl wrote:   
   >cross@spitfire.i.gajendra.net (Dan Cross) writes:   
   >>In article <2025Aug13.181010@mips.complang.tuwien.ac.at>,   
   >>Anton Ertl wrote:   
   >>>For lseek(2):   
   >>>   
   >>>| Upon successful completion, lseek() returns the resulting offset   
   >>>| location as measured in bytes from the beginning of the file.   
   >>>   
   >>>Given that off_t is signed, lseek(2) can only return positive values.   
   >>   
   >>This is incorrect; or rather, it's accidentally correct now, but   
   >>was not previously. The 1990 POSIX standard did not explicitly   
   >>forbid a file that was so large that the offset couldn't   
   >>overflow, hence why in 1990 POSIX you have to be careful about   
   >>error handling when using `lseek`.   
   >>   
   >>It is true that POSIX 2024 _does_ prohibit seeking so far that   
   >>the offset would become negative, however.   
   >   
   >I don't think that this is accidental. In 1990 signed overlow had   
   >reliable behaviour on common 2s-complement hardware with the C   
   >compilers of the day.   
      
   This is simply not true. If anything, there was more variety of   
   hardware supported by C90, and some of those systems were 1's   
   complement or sign/mag, not 2's complement. Consequently,   
   signed integer overflow has _always_ had undefined behavior in   
   ANSI/ISO C.   
      
   However, conversion from signed to unsigned has always been   
   well-defined, and follows effectively 2's complement semantics.   
      
   Conversion from unsigned to signed is a bit more complex, and is   
   implementation defined, but not UB. Given that the system call   
   interface is necessarily deeply intwined with the implementation   
   I see no reason why the semantics of signed overflow should be   
   an issue here.   
      
   >Nowadays the exotic hardware where this would   
   >not work that way has almost completely died out (and C is not used on   
   >the remaining exotic hardware),   
      
   If by "C is not used" you mean newer editions of the C standard   
   are not used on very old computers with strange representations   
   of signed integers, then maybe.   
      
   >but now compilers sometimes do funny   
   >things on integer overflow, so better don't go there or anywhere near   
   >it.   
      
   This isn't about signed overflow. The issue here is conversion   
   of an unsigned value to signed; almost certainly, the kernel   
   performs the calculation of the actual file offset using   
   unsigned arithmetic, and relies on the (assembler, mind you)   
   system call stubs to map those to the appropriate userspace   
   type.   
      
   I think this is mostly irrelevant, as the system call stub,   
   almost by necessity, must be written in assembler in order to   
   have percise control over the use of specific registers and so   
   on. From C's perspective, a program making a system call just   
   calls some function that's defined to return a signed integer;   
   the assembler code that swizzles the register that integer will   
   be extracted from sets things up accordingly. In other words,   
   the conversion operation that the C standard mentions isn't at   
   play, since the code that does the "conversion" is in assembly.   
   Again from C's perspective the return value of the syscall stub   
   function is already signed with no need of conversion.   
      
   No, for `lseek`, the POSIX rationale explains the reasoning here   
   quite clearly: the 1990 standard permitted negative offsets, and   
   programs were expected to accommodate this by special handling   
   of `errno` before and after calls to `lseek` that returned   
   negative values. This was deemed onerous and fragile, so they   
   modified the standard to prohibit calls that would result in   
   negative offsets.   
      
   >>But, POSIX 2024   
   >>(still!!) supports multiple definitions of `off_t` for multiple   
   >>environments, in which overflow is potentially unavoidable.   
   >   
   >POSIX also has the EOVERFLOW error for exactly that case.   
   >   
   >Bottom line: The off_t returned by lseek(2) is signed and always   
   >positive.   
      
   As I said earlier, post POSIX.1-1990, this is true.   
      
   >>>For mmap(2):   
   >>>   
   >>>| On success, mmap() returns a pointer to the mapped area.   
   >>>   
   >>>So it's up to the kernel which user-level addresses it returns. E.g.,   
   >>>32-bit Linux originally only produced user-level addresses below 2GB.   
   >>>When memories grew larger, on some architectures (e.g., i386) Linux   
   >>>increased that to 3GB.   
   >>   
   >>The point is that the programmer shouldn't have to care.   
   >   
   >True, but completely misses the point.   
      
   I don't see why. You were talking about the system call stubs,   
   which run in userspace, and are responsbile for setting up state   
   so that the kernel can perform some requested action on entry,   
   whether by trap, call gate, or special instruction, and then for   
   tearing down that state and handling errors on return from the   
   kernel.   
      
   For mmap, there is exactly one value that may be returned from   
   the its stub that indicates an error; any other value, by   
   definition, represents a valid mapping. Whether such a mapping   
   falls in the first 2G, 3G, anything except the upper 256MiB, or   
   some hole in the middle is the part that's irrelevant, and   
   focusing on that misses the main point: all the stub has to do   
   is detect the error, using whatever convetion the kernel   
   specifies for communicating such things back to the program, and   
   ensure that in an error case, MAP_FAILED is returned from the   
   stub and `errno` is set appropriately. Everything else is   
   superfluous.   
      
   >>>Sure, but system calls are first introduced in real kernels using the   
   >>>actual system call interface, and are limited by that interface. And   
   >>>that interface is remarkably similar between the early days of Unix   
   >>>and recent Linux kernels for various architectures.   
   >>   
   >>Not precisely. On x86_64, for example, some Unixes use a flag   
   >>bit to determine whether the system call failed, and return   
   >>(positive) errno values; Linux returns negative numbers to   
   >>indicate errors, and constrains those to values between -4095   
   >>and -1.   
   >>   
   >>Presumably that specific set of values is constrained by `mmap`:   
   >>assuming a minimum 4KiB page size, the last architecturally   
   >>valid address where a page _could_ be mapped is equivalent to   
   >>-4096 and the first is 0. If they did not have that constraint,   
   >>they'd have to treat `mmap` specially in the system call path.   
   >   
   >I am pretty sure that in the old times, Linux-i386 indicated failure   
   >by returning a value with the MSB set, and the wrapper just checked   
   >whether the return value was negative. And for mmap() that worked   
   >because user-mode addresses were all below 2GB. Addresses furthere up   
   >where reserved for the kernel.   
      
   Define "Linux-i386" in this case. For the kernel, I'm confident   
   that was NOT the case, and it is easy enough to research, since   
   old kernel versions are online. Looking at e.g. 0.99.15, one   
   can see that they set the carry bit in the flags register to   
   indicate an error, along with returning a negative errno value:   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|