home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 129,497 of 131,241   
   MitchAlsup to All   
   Re: VAX (1/2)   
   25 Aug 25 00:56:26   
   
   From: user5857@newsgrouper.org.invalid   
      
   anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
      
   > BGB  writes:   
   > >But, it seems to have a few obvious weak points for RISC-V:   
   > >   Crappy with arrays;   
   > >   Crappy with code with lots of large immediate values;   
   > >   Crappy with code which mostly works using lots of global variables;   
   > >     Say, for example, a lot of Apogee / 3D Realms code;   
   > >     They sure do like using lots of global variables.   
   > >     id Software also likes globals, but not as much.   
   > >   ...   
   >   
   > Let's see:   
   >   
   > #include    
   >   
   > long arrays(long *v, size_t n)   
   > {   
   >   long i, r;   
   >   for (i=0, r=0; i     r+=v[i];   
   >   return r;   
   > }   
   arrays:   
       MOV  Ri,#0   
       MOV  Rr,#0   
       VEC  Rt,{}   
       LDD  Rl,[Rv,Ri<<3]   
       ADD  Rr,Rr,Rl   
       LOOP LT,Ri,Rn,#1   
       MOV  R1,Rr   
       RET   
      
   7 instructions, 1 instruction-modifier; 8 words.   
   >   
   > long a, b, c, d;   
   >   
   > void globals(void)   
   > {   
   >   a = 0x1234567890abcdefL;   
   >   b = 0xcdef1234567890abL;   
   >   c = 0x567890abcdef1234L;   
   >   d = 0x5678901234abcdefL;   
   > }   
      
   globals:   
       STD 0x1234567890abcdef,[IP,a]   
       STD 0xcdef1234567890ab,[IP,b]   
       STD 0x567890abcdef1234,[IP,c]   
       STD 0x5678901234abcdef,[IP,d]   
       RET   
      
   5 instructions, 13 words, 0 .data, 0 .bss   
      
   > gcc-10.3 -Wall -O2 compiles this to the following RV64GC code:   
   >   
   > 0000000000010434 :   
   >    10434:       cd81        beqz    a1,1044c    
   >    10436:       058e        slli    a1,a1,0x3   
   >    10438:       87aa        mv      a5,a0   
   >    1043a:       00b506b3    add     a3,a0,a1   
   >    1043e:       4501        li      a0,0   
   >    10440:       6398        ld      a4,0(a5)   
   >    10442:       07a1        addi    a5,a5,8   
   >    10444:       953a        add     a0,a0,a4   
   >    10446:       fed79de3    bne     a5,a3,10440    
   >    1044a:       8082        ret   
   >    1044c:       4501        li      a0,0   
   >    1044e:       8082        ret   
   >   
   > 0000000000010450 :   
   >    10450:       8201b583    ld      a1,-2016(gp) # 12020 <__SDATA_BEGIN__>   
   >    10454:       8281b603    ld      a2,-2008(gp) # 12028 <__SD   
   TA_BEGIN__+0x8>   
   >    10458:       8301b683    ld      a3,-2000(gp) # 12030 <__SD   
   TA_BEGIN__+0x10>   
   >    1045c:       8381b703    ld      a4,-1992(gp) # 12038 <__SD   
   TA_BEGIN__+0x18>   
   >    10460:       86b1b423    sd      a1,-1944(gp) # 12068    
   >    10464:       86c1b023    sd      a2,-1952(gp) # 12060    
   >    10468:       84d1bc23    sd      a3,-1960(gp) # 12058    
   >    1046c:       84e1b823    sd      a4,-1968(gp) # 12050    
   >    10470:       8082        ret   
   >   
   > When using -Os, arrays becomes 2 bytes shorter, but the inner loop   
   > becomes longer.   
   >   
   > gcc-12.2 -Wall -O2 -falign-labels=1 -falign-loops=1 -falign-jumps=1   
   -falign-functions=1   
   > compiles this to the following AMD64 code:   
   >   
   > 000000001139 :   
   > 1139:       48 85 f6                test   %rsi,%rsi   
   > 113c:       74 13                   je     1151    
   > 113e:       48 8d 14 f7             lea    (%rdi,%rsi,8),%rdx   
   > 1142:       31 c0                   xor    %eax,%eax   
   > 1144:       48 03 07                add    (%rdi),%rax   
   > 1147:       48 83 c7 08             add    $0x8,%rdi   
   > 114b:       48 39 d7                cmp    %rdx,%rdi   
   > 114e:       75 f4                   jne    1144    
   > 1150:       c3                      ret   
   > 1151:       31 c0                   xor    %eax,%eax   
   > 1153:       c3                      ret   
   >   
   > 000000001154 :   
   > 1154:       48 b8 ef cd ab 90 78    movabs $0x1234567890abcdef,%rax   
   > 115b:       56 34 12   
   > 115e:       48 89 05 cb 2e 00 00    mov    %rax,0x2ecb(%rip)        # 4030   
      
   > 1165:       48 b8 ab 90 78 56 34    movabs $0xcdef1234567890ab,%rax   
   > 116c:       12 ef cd   
   > 116f:       48 89 05 b2 2e 00 00    mov    %rax,0x2eb2(%rip)        # 4028   
      
   > 1176:       48 b8 34 12 ef cd ab    movabs $0x567890abcdef1234,%rax   
   > 117d:       90 78 56   
   > 1180:       48 89 05 99 2e 00 00    mov    %rax,0x2e99(%rip)        # 4020   
      
   > 1187:       48 b8 ef cd ab 34 12    movabs $0x5678901234abcdef,%rax   
   > 118e:       90 78 56   
   > 1191:       48 89 05 80 2e 00 00    mov    %rax,0x2e80(%rip)        # 4018   
      
   > 1198:       c3                      ret   
   >   
   > gcc-10.2 -Wall -O2 -falign-labels=1 -falign-loops=1 -falign-jumps=1   
   -falign-functions=1   
   > compiles this to the following ARM A64 code:   
   >   
   > 0000000000000734 :   
   >  734:   b4000121        cbz     x1, 758    
   >  738:   aa0003e2        mov     x2, x0   
   >  73c:   d2800000        mov     x0, #0x0                        // #0   
   >  740:   8b010c43        add     x3, x2, x1, lsl #3   
   >  744:   f8408441        ldr     x1, [x2], #8   
   >  748:   8b010000        add     x0, x0, x1   
   >  74c:   eb03005f        cmp     x2, x3   
   >  750:   54ffffa1        b.ne    744   // b.any   
   >  754:   d65f03c0        ret   
   >  758:   d2800000        mov     x0, #0x0                        // #0   
   >  75c:   d65f03c0        ret   
   >   
   > 0000000000000760 :   
   >  760:   d299bde2        mov     x2, #0xcdef                     // #52719   
   >  764:   b0000081        adrp    x1, 11000 <__cxa_finalize@GLIBC_2.17>   
   >  768:   f2b21562        movk    x2, #0x90ab, lsl #16   
   >  76c:   9100e020        add     x0, x1, #0x38   
   >  770:   f2cacf02        movk    x2, #0x5678, lsl #32   
   >  774:   d2921563        mov     x3, #0x90ab                     // #37035   
   >  778:   f2e24682        movk    x2, #0x1234, lsl #48   
   >  77c:   f9001c22        str     x2, [x1, #56]   
   >  780:   d2824682        mov     x2, #0x1234                     // #4660   
   >  784:   d299bde1        mov     x1, #0xcdef                     // #52719   
   >  788:   f2aacf03        movk    x3, #0x5678, lsl #16   
   >  78c:   f2b9bde2        movk    x2, #0xcdef, lsl #16   
   >  790:   f2a69561        movk    x1, #0x34ab, lsl #16   
   >  794:   f2c24683        movk    x3, #0x1234, lsl #32   
   >  798:   f2d21562        movk    x2, #0x90ab, lsl #32   
   >  79c:   f2d20241        movk    x1, #0x9012, lsl #32   
   >  7a0:   f2f9bde3        movk    x3, #0xcdef, lsl #48   
   >  7a4:   f2eacf02        movk    x2, #0x5678, lsl #48   
   >  7a8:   f2eacf01        movk    x1, #0x5678, lsl #48   
   >  7ac:   a9008803        stp     x3, x2, [x0, #8]   
   >  7b0:   f9000c01        str     x1, [x0, #24]   
   >  7b4:   d65f03c0        ret   
   >   
   > So, the overall sizes (including data size for globals() on RV64GC) are:   
   >   
   > arrays globals    Architecture   
   > 28     66 (34+32) RV64GC   
   > 27     69         AMD64   
   > 44     84         ARM A64   
   >   
   > So RV64GC is smallest for the globals/large-immediate test here, and   
   > only beaten by one byte by AMD64 for the array test.  Looking at the   
   > code generated for the inner loop of arrays(), all the inner loops   
   > contain four instructions, so certainly in this case RV64GC is not   
   > crappier than the others.  Interestingly, the reasons for using four   
   > instructions (rather than five) are different on these architectures:   
   >   
   > * RV64GC uses a compare-and-branch instruction.   
   > * AMD64 uses a load-and-add instruction.   
   > * ARM A64 uses an auto-increment instruction.   
   >   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca