From: kegs@provalid.com   
      
   In article <2025Oct4.121741@mips.complang.tuwien.ac.at>,   
   Anton Ertl wrote:   
   >MitchAlsup writes:   
   >>LLVM compiles C with stricter typing than GCC resulting in a lot   
   >>of smashes:: For example::   
   >>   
   >>int subroutine( int a, int b )   
   >>{   
   >> return a+b;   
   >>}   
   >>   
   >>Compiles into:   
   >>   
   >>subroutine:   
   >> ADD R1,R1,R2   
   >> SRA R1,R1,<32,0> // limit result to (int)   
   >> RET   
   >   
   >I tested this on AMD64, and did not find sign-extension in the caller,   
   >neither with gcc-14 nor with clang-19; both produce the following code   
   >for your example (with "subroutine" renamed into "subroutine1").   
   >   
   >0000000000000000 :   
   > 0: 8d 04 37 lea (%rdi,%rsi,1),%eax   
   > 3: c3 ret   
   >   
   >It's not about strict or lax typing, it's about what the calling   
   >convention promises about types that are smaller than a machine word.   
   >If the calling convention requires/guarantees that ints are   
   >sign-extended, the compiler must use instructions that produce a   
   >sign-extended result. If the calling convention guarantees that ints   
   >are zero-extended (sounds perverse, but RV64 has the guarantee that   
   >unsigned is passed in sign-extended form, which is equally perverse),   
   >then the compiler must use instructions that produce a zero-extended   
   >result (e.g., AMD64's addl). If the calling convention only requires   
   >and guarantees the low-order 32 bits (I call this garbage-extended),   
   >then the compiler can use instructions that perform 64-bit adds; this   
   >is what we are seeing above.   
   >   
   >The other side of the medal is what is needed at the caller: If the   
   >caller needs to cconvert a sign-extended int into a long, it does not   
   >have to do anything. If it needs to convert a zero-extended or   
   >garbage-extended int into a long, it has to sign-extend the value.   
      
   AMD64 in hardware does 0 extension of 32-bit operations. From your   
   example "lea (%rdi,%rsi,1),%eax" (AT&T notation, so %eax is the dest),   
   the 64-bit register %rax will have 0's written into bits [63:32].   
   So the AMD64 convention for 32-bit values in 64-bit registers is to   
   zero-extend on writes. And to ignore the upper 32-bits on reads, so   
   using a 64-bit register should use the %exx name.   
      
   I agree with you that I32LP64 was a mistake, but it exists, and I   
   think ARM64 did a good job handling it. It has all integer operations   
   working on two sizes: 32-bit and 64-bit, and when writing a 32-bit result,   
   it 0-extends the register value.   
      
   You don't want "garbage extend" since you want a predictable answer.   
   Your choices for writing 32-bit results in a 64-bit register are thus   
   sign-extend (not a good choice) or zero-extend (what almost   
   everyone chose). RISC-V is in another land, where they effectively have   
   no 32-bit operations, but rather a convention that all 32-bit inputs   
   must be sign-extended in a 64-bit register.   
      
   For C and C++ code, the standard dictates that all integer operations are   
   done with "int" precision, unless some operand is larger than int, and then   
   do it in that precision. So there's no real need for 8-bit and 16-bit   
   operations to be natively by the CPU--these operations are actually done   
   as int's already. If you have a variable which is a byte, then assigning   
   to that variable, and then using that variable again you will need to   
   zero-extend, but honestly, this is not usually a performance path. It's   
   likely to be stored to memory instead, so no masking or sign extending   
   should be needed.   
      
   If you pick ILP64 for your ABI, then you will get rid of almost all of   
   these zero- and sign-extensions of 32-bit C and C++ code. It will just   
   work. If you pick I32LP64, then you should have a full suite of 32-bit   
   operations and 64-bit operations, at least for all add, subtract, and   
   compare operations. And if you do I32LP64, your indexed addressing   
   modes should have 3 types of indexed registers: 64-bit, 32-bit signed,   
   and 32-bit unsigned. That worked well for ARM64.   
      
   Kent   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|