home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 129,853 of 131,241   
   BGB to Kent Dickey   
   Re: sign/zero/garbage extension   
   08 Oct 25 22:58:53   
   
   From: cr88192@gmail.com   
      
   On 10/8/2025 3:41 PM, Kent Dickey wrote:   
   > In article <2025Oct7.210925@mips.complang.tuwien.ac.at>,   
   > Anton Ertl  wrote:   
   >> MitchAlsup  writes:   
   >>>   
   >>> anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >> ...   
   >>>>                       If n is unsigned, you can also choose unsigned,   
   >>>> but then this code will be slow on RV64 (and MIPS64 and SPARC V9 and   
   >>>> PowerPC64 and Alpha).   
   >>>   
   >>> Example please !?!   
   >>   
   >> With a slightly different loop:   
   >>   
   >> long foo(long a[], unsigned l, unsigned h)   
   >> {   
   >>   unsigned i;   
   >>   long r=0;   
   >>   for (i=l; i!=h; i++)   
   >>     r+=a[i];   
   >>   return r;   
   >> }   
   >>   
   >> gcc-10 -O3 produces on RV64G:   
   >>   
   >> 0000000000000000 :   
   >>    0:   872a                    mv      a4,a0   
   >>    2:   4501                    li      a0,0   
   >>    4:   00c58c63                beq     a1,a2,1c <.L4>   
   >>   
   >> 0000000000000008 <.L3>:   
   >>    8:   02059793                slli    a5,a1,0x20   
   >>    c:   83f5                    srli    a5,a5,0x1d   
   >>    e:   97ba                    add     a5,a5,a4   
   >>   10:   639c                    ld      a5,0(a5)   
   >>   12:   2585                    addiw   a1,a1,1   
   >>   14:   953e                    add     a0,a0,a5   
   >>   16:   feb619e3                bne     a2,a1,8 <.L3>   
   >>   1a:   8082                    ret   
   >>   
   >> 000000000000001c <.L4>:   
   >>   1c:   8082                    ret   
   >   
   > Unsigned 32-bit stuff on RISC-V has a habit of blowing up with lots of   
   > overhead instructions.  Change the loop condition to "i < h", and you get   
   > on godbolt.org with -O2 -march=rv64g   
   >   
   > foo(long*, unsigned int, unsigned int):   
   >          mv      a5,a0   
   >          bgeu    a1,a2,.L4   
   >          addiw   a4,a2,-1   
   >          subw    a4,a4,a1   
   >          slli    a4,a4,32   
   >          slli    a1,a1,32   
   >          srli    a1,a1,32   
   >          srli    a4,a4,32   
   >          add     a4,a4,a1   
   >          addi    a3,a0,8   
   >          slli    a4,a4,3   
   >          slli    a1,a1,3   
   >          li      a0,0   
   >          add     a5,a5,a1   
   >          add     a4,a4,a3   
   > .L3:   
   >          ld      a3,0(a5)   
   >          addi    a5,a5,8   
   >          add     a0,a0,a3   
   >          bne     a5,a4,.L3   
   >          ret   
   > .L4:   
   >          li      a0,0   
   >          ret   
   >   
   > This does get better with "-march=rv64g_zab", but Zab isn't part of RV64G.   
   >   
   > GCC has actually optimized the loop itself better, but it has lots of   
   > fixup code to create 64-bit register versions of the unsigned inputs   
   > (because the RISC-V ABI specifies all 32-bit quantities must be   
   > sign-extended at the function call boundaries, even if they are   
   > unsigned).   
   >   
   > In many cases, the sign-extension works well (BGEU on 64-bit registers   
   > that are 32-bit sign-extended, works as it would if the values were   
   > 0-extended).  But mixing true 64-bit unsigned with 32-bit unsigned   
   > requires fixup instructions.  And the lack of a ZEXT.W in the basic   
   > 64-bit instruction set was a mistake.  RISC-V gives us a modern example   
   > of how to handle not having a full suite of 32-bit instructions, and   
   > what that would look like.   
   >   
      
   Had they not dropped ADDWU and SUBWU from BitManip, and did the sensible   
   thing of using zero-extended "unsigned int", much of this mess goes away...   
      
      
   Sign-extending "unsigned int" is almost the worst possible option (even   
   within the limits of plain RV64G). Sign extension makes "a+b" slightly   
   cheaper, but everything else gets worse. It is, ironically, better to   
   just pay the up-front cost of zero extension for add/subtract (and maybe   
   throw up a middle finger to the ABI spec on this one).   
      
      
   Well, then again, it seems there are multiple versions of the ABI spec   
   floating around in the internet, seemingly with differences as to the   
   exact handling of passing/returning structures, etc. So, I don't   
   personally put too much weight into worrying about there being a minor   
   mismatch here.   
      
   Where:   
      Some versions appear to be using SysV-AMD64 style struct rules;   
        With structs being returned by on-stack copy.   
      Some versions using the register, register-pair, or by-reference.   
        With structs returned in X10, X11:X10,   
          or by passing a return pointer as a hidden argument.   
        This also being what BGBCC uses;   
      ...   
      
   Then, differences between LP64 and LP64D:   
      LP64: All F registers are Scratch;   
      LP64D: Some of the F registers are Preserved.   
      
      
   Well, and there are bigger concerns on the ABI front (the ABI used by   
   BGBCC not being strictly 1:1 with the standard ABI, but close enough   
   that most cases will work):   
      Basic case is LP64 argument passing with LP64D's register rules.   
      
   Then an XG3 ABI variant (can also be used for RV64G) which defines there   
   as being 16-argument registers and reassigns 4 of the F registers from   
   scratch to preserved (to bring the balance slightly closer to an even   
   split).   
      
   So:   
      X: 4 SPR, 16 Scratch, 12 Preserved   
      F: 16 Scratch, 16 Preserved (Vs 20+12)   
   So: 32 Scratch + 28 Preserved   
   Vs: 36 Scratch + 24 Preserved   
      
      
      
   > Kent   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca