From: user5857@newsgrouper.org.invalid   
      
   anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
      
   > Michael S writes:   
   > >On Thu, 13 Nov 2025 09:24:20 GMT   
   > >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:   
   > >> Actually, with uint128_t you get pretty far, and _BitInt(bits) has   
   > >> been added in C23, which has good potential, but is not quite there.   
   > >   
   > >Yes, that what I wrote above.   
   > >As far as BGB is concerned, the big disadvantage is absence of support   
   > >by MSVC.   
   >   
   > Why would that be a disadvantage? If MSVC does not do what he needs,   
   > there are other C compilers to choose from.   
   >   
   > >> Builtins for add-with-carry and intrinsics are somewhat disappointing.   
   > >>   
   > >> - anton   
   > >   
   > >For me the most disappointing part is that different architectures   
   > >have different spellings.   
   >   
   > For intrinsics that's by design. They are essentially a way to write   
   > assembly language instructions in Fortran or C. And assembly language   
   > is compiler-specific.   
      
   {Pedantic mode=ON}   
   Assembly language is ASSEMBLER specific.   
   Compilers have to spit out what the assembler wants or go directly to   
   linker representation.   
   {Pedantic mode=OFF}   
      
   > >Other than that even gcc now mostly able to generate   
   > >decent code for Intel's variant. MSVC and clang were able to do it for   
   > >very long time.   
   >   
   > When using the Intel intrinsic c_out = _addcarry_u64(c_in, s1, s2,&sum),   
   > the code from both gcc and clang uses adcq, but cannot preserve the   
   > carry in CF in a loop, and moves it into a register right after the   
   > adcq, and back from the register to CF right before:   
   >   
   > addb $-1, %r8b   
   > adcq (%rdx,%rax,8), %r9   
   > setb %r8b   
      
    CALK R9,what,ever   
    CARRY R9,{{IO}}   
    ADD R8,Rs1,Rs2   
   performs   
    {R9, R8} = R9 + Rs1 + Rs2;   
      
   > If you (or compiler unrolling) have several _addcarry_u64 in a row,   
   > with the carry-out becoming the carry-in of the next one, at least one   
   > of these compilers manages to eliminate the overhead between these   
   > adcqs, but of course not at the start and end of the sequence.   
   >   
   > >Or do you have in mind new gcc intrinsic in a group "Arithmetic with   
   > >Overflow Checking" ?   
   >   
   > These are gcc builtins, not intrinsics. The difference is that they   
   > work on all architectures. However, when I looked (three months ago),   
   > gcc did not have a builtin with carry-in; the builtins you mention   
   > only provide carry-out (or overflow-out).   
   >   
   > However, clang has a builtin with carry-in and carry-out:   
   > sum = __builtin_addcll(s1, s2, c_in, &c_out)   
   >   
   > Unfortunately, the code produced by clang is pretty horrible for ARM   
   > A64 and AMD64:   
   >   
   > ARM A64: # clang 11.0.1 -Os   
   > adds x9, x9, x10   
   > cset w10, hs   
   > adds x9, x9, x8   
   > cset w8, hs   
   > orr w8, w10, w8   
   >   
   > AMD64: # clang 14.0.6 -march=x86-64-v4 -Os   
   > addq (%rdx,%r8,8), %r9   
   > setb %r10b   
   > addq %rax, %r9   
   > setb %al   
   > orb %r10b, %al   
   > movzbl %al, %eax   
   >   
   > For RISC-V the code is a five-instruction sequence, which is the   
   > minimum that's possible on RISC-V.   
      
   2 in My 66000, 1 if you don't count CARRY as it is an   
   instruction-modifier instead of an instruction. There is   
   only 1 instruction that "gets executed".   
      
   >   
   > - anton   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|