From: kegs@provalid.com   
      
   In article <2025Oct7.210925@mips.complang.tuwien.ac.at>,   
   Anton Ertl wrote:   
   >MitchAlsup writes:   
   >>   
   >>anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >...   
   >>> If n is unsigned, you can also choose unsigned,   
   >>> but then this code will be slow on RV64 (and MIPS64 and SPARC V9 and   
   >>> PowerPC64 and Alpha).   
   >>   
   >>Example please !?!   
   >   
   >With a slightly different loop:   
   >   
   >long foo(long a[], unsigned l, unsigned h)   
   >{   
   > unsigned i;   
   > long r=0;   
   > for (i=l; i!=h; i++)   
   > r+=a[i];   
   > return r;   
   >}   
   >   
   >gcc-10 -O3 produces on RV64G:   
   >   
   >0000000000000000 :   
   > 0: 872a mv a4,a0   
   > 2: 4501 li a0,0   
   > 4: 00c58c63 beq a1,a2,1c <.L4>   
   >   
   >0000000000000008 <.L3>:   
   > 8: 02059793 slli a5,a1,0x20   
   > c: 83f5 srli a5,a5,0x1d   
   > e: 97ba add a5,a5,a4   
   > 10: 639c ld a5,0(a5)   
   > 12: 2585 addiw a1,a1,1   
   > 14: 953e add a0,a0,a5   
   > 16: feb619e3 bne a2,a1,8 <.L3>   
   > 1a: 8082 ret   
   >   
   >000000000000001c <.L4>:   
   > 1c: 8082 ret   
      
   Unsigned 32-bit stuff on RISC-V has a habit of blowing up with lots of   
   overhead instructions. Change the loop condition to "i < h", and you get   
   on godbolt.org with -O2 -march=rv64g   
      
   foo(long*, unsigned int, unsigned int):   
    mv a5,a0   
    bgeu a1,a2,.L4   
    addiw a4,a2,-1   
    subw a4,a4,a1   
    slli a4,a4,32   
    slli a1,a1,32   
    srli a1,a1,32   
    srli a4,a4,32   
    add a4,a4,a1   
    addi a3,a0,8   
    slli a4,a4,3   
    slli a1,a1,3   
    li a0,0   
    add a5,a5,a1   
    add a4,a4,a3   
   .L3:   
    ld a3,0(a5)   
    addi a5,a5,8   
    add a0,a0,a3   
    bne a5,a4,.L3   
    ret   
   .L4:   
    li a0,0   
    ret   
      
   This does get better with "-march=rv64g_zab", but Zab isn't part of RV64G.   
      
   GCC has actually optimized the loop itself better, but it has lots of   
   fixup code to create 64-bit register versions of the unsigned inputs   
   (because the RISC-V ABI specifies all 32-bit quantities must be   
   sign-extended at the function call boundaries, even if they are   
   unsigned).   
      
   In many cases, the sign-extension works well (BGEU on 64-bit registers   
   that are 32-bit sign-extended, works as it would if the values were   
   0-extended). But mixing true 64-bit unsigned with 32-bit unsigned   
   requires fixup instructions. And the lack of a ZEXT.W in the basic   
   64-bit instruction set was a mistake. RISC-V gives us a modern example   
   of how to handle not having a full suite of 32-bit instructions, and   
   what that would look like.   
      
   Kent   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|