... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,282 of 131,241
Anton Ertl to Michael S
Re: Tonights Tradeoff
13 Nov 25 18:09:12
   From: anton@mips.complang.tuwien.ac.at   
      
   Michael S  writes:   
   >On Thu, 13 Nov 2025 09:24:20 GMT   
   >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:   
   >> Actually, with uint128_t you get pretty far, and _BitInt(bits) has   
   >> been added in C23, which has good potential, but is not quite there.   
   >   
   >Yes, that what I wrote above.   
   >As far as BGB is concerned, the big disadvantage is absence of support   
   >by MSVC.   
      
   Why would that be a disadvantage?  If MSVC does not do what he needs,   
   there are other C compilers to choose from.   
      
   >> Builtins for add-with-carry and intrinsics are somewhat disappointing.   
   >>   
   >> - anton   
   >   
   >For me the most disappointing part is that different architectures   
   >have different spellings.   
      
   For intrinsics that's by design.  They are essentially a way to write   
   assembly language instructions in Fortran or C.  And assembly language   
   is compiler-specific.   
      
   >Other than that even gcc now mostly able to generate   
   >decent code for Intel's variant. MSVC and clang were able to do it for   
   >very long time.   
      
   When using the Intel intrinsic c_out = _addcarry_u64(c_in, s1, s2,&sum),   
   the code from both gcc and clang uses adcq, but cannot preserve the   
   carry in CF in a loop, and moves it into a register right after the   
   adcq, and back from the register to CF right before:   
      
   addb $-1, %r8b   
   adcq (%rdx,%rax,8), %r9   
   setb %r8b   
      
   If you (or compiler unrolling) have several _addcarry_u64 in a row,   
   with the carry-out becoming the carry-in of the next one, at least one   
   of these compilers manages to eliminate the overhead between these   
   adcqs, but of course not at the start and end of the sequence.   
      
   >Or do you have in mind new gcc intrinsic in a group "Arithmetic with   
   >Overflow Checking" ?   
      
   These are gcc builtins, not intrinsics.  The difference is that they   
   work on all architectures.  However, when I looked (three months ago),   
   gcc did not have a builtin with carry-in; the builtins you mention   
   only provide carry-out (or overflow-out).   
      
   However, clang has a builtin with carry-in and carry-out:   
   sum = __builtin_addcll(s1, s2, c_in, &c_out)   
      
   Unfortunately, the code produced by clang is pretty horrible for ARM   
   A64 and AMD64:   
      
   ARM A64: # clang 11.0.1 -Os   
   adds x9, x9, x10   
   cset w10, hs   
   adds x9, x9, x8   
   cset w8, hs   
   orr w8, w10, w8   
      
   AMD64: # clang 14.0.6 -march=x86-64-v4 -Os   
   addq (%rdx,%r8,8), %r9   
   setb %r10b   
   addq %rax, %r9   
   setb %al   
   orb %r10b, %al   
   movzbl %al, %eax   
      
   For RISC-V the code is a five-instruction sequence, which is the   
   minimum that's possible on RISC-V.   
      
   - anton   
   --   
   'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'   
     Mitch Alsup,    
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]