home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 129,852 of 131,241   
   Kent Dickey to Anton Ertl   
   Re: sign/zero/garbage extension (was: Ti   
   08 Oct 25 20:41:21   
   
   From: kegs@provalid.com   
      
   In article <2025Oct7.210925@mips.complang.tuwien.ac.at>,   
   Anton Ertl  wrote:   
   >MitchAlsup  writes:   
   >>   
   >>anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >...   
   >>>                      If n is unsigned, you can also choose unsigned,   
   >>> but then this code will be slow on RV64 (and MIPS64 and SPARC V9 and   
   >>> PowerPC64 and Alpha).   
   >>   
   >>Example please !?!   
   >   
   >With a slightly different loop:   
   >   
   >long foo(long a[], unsigned l, unsigned h)   
   >{   
   >  unsigned i;   
   >  long r=0;   
   >  for (i=l; i!=h; i++)   
   >    r+=a[i];   
   >  return r;   
   >}   
   >   
   >gcc-10 -O3 produces on RV64G:   
   >   
   >0000000000000000 :   
   >   0:   872a                    mv      a4,a0   
   >   2:   4501                    li      a0,0   
   >   4:   00c58c63                beq     a1,a2,1c <.L4>   
   >   
   >0000000000000008 <.L3>:   
   >   8:   02059793                slli    a5,a1,0x20   
   >   c:   83f5                    srli    a5,a5,0x1d   
   >   e:   97ba                    add     a5,a5,a4   
   >  10:   639c                    ld      a5,0(a5)   
   >  12:   2585                    addiw   a1,a1,1   
   >  14:   953e                    add     a0,a0,a5   
   >  16:   feb619e3                bne     a2,a1,8 <.L3>   
   >  1a:   8082                    ret   
   >   
   >000000000000001c <.L4>:   
   >  1c:   8082                    ret   
      
   Unsigned 32-bit stuff on RISC-V has a habit of blowing up with lots of   
   overhead instructions.  Change the loop condition to "i < h", and you get   
   on godbolt.org with -O2 -march=rv64g   
      
   foo(long*, unsigned int, unsigned int):   
           mv      a5,a0   
           bgeu    a1,a2,.L4   
           addiw   a4,a2,-1   
           subw    a4,a4,a1   
           slli    a4,a4,32   
           slli    a1,a1,32   
           srli    a1,a1,32   
           srli    a4,a4,32   
           add     a4,a4,a1   
           addi    a3,a0,8   
           slli    a4,a4,3   
           slli    a1,a1,3   
           li      a0,0   
           add     a5,a5,a1   
           add     a4,a4,a3   
   .L3:   
           ld      a3,0(a5)   
           addi    a5,a5,8   
           add     a0,a0,a3   
           bne     a5,a4,.L3   
           ret   
   .L4:   
           li      a0,0   
           ret   
      
   This does get better with "-march=rv64g_zab", but Zab isn't part of RV64G.   
      
   GCC has actually optimized the loop itself better, but it has lots of   
   fixup code to create 64-bit register versions of the unsigned inputs   
   (because the RISC-V ABI specifies all 32-bit quantities must be   
   sign-extended at the function call boundaries, even if they are   
   unsigned).   
      
   In many cases, the sign-extension works well (BGEU on 64-bit registers   
   that are 32-bit sign-extended, works as it would if the values were   
   0-extended).  But mixing true 64-bit unsigned with 32-bit unsigned   
   requires fixup instructions.  And the lack of a ZEXT.W in the basic   
   64-bit instruction set was a mistake.  RISC-V gives us a modern example   
   of how to handle not having a full suite of 32-bit instructions, and   
   what that would look like.   
      
   Kent   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca