From: user5857@newsgrouper.org.invalid   
      
   anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
      
   > MitchAlsup writes:   
   > >   
   > >anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   > ..   
   > >My 66000 CMP is signless--it compares two integer registers and delivers   
   > >a bit vector of all possible comparisons {2 equality, 4 signed, 4 unsigned,   
   > >4 range checks, [and in FP land 10-bits are the class of the RS1 operand]}   
   >   
   > With an 88000-style compare and a result register of 64 bits, you can   
   > spend 14 bits on 64-bit comparison, 14 bits on 32-bit comparison, 14   
   > bits on 16-bit comparison, and 14 bits on 8-bit comparison, and still   
   > have 8 bits left. What is a "range check" and why does it take 4   
   > bits?   
      
   CIN 0 <= Reg < Max   
   FIN 0 < Reg <= Max   
   RIN 0 < Reg < Max   
   SIN 0 <= Reg <= Max   
      
   >   
   > >> It is certainly part of the way towards my idea of having sign- and   
   > >> zero-extended 32-bit operands for every operand of every instruction.   
   > >   
   > >Unnecessary if the integer calculation deliver properly range-limited   
   > >64-bit results.   
   >   
   > Sign- or zero extension will still be necessary for things like   
   >   
   > long a=...   
   > int b=a;   
   > .. c[b];   
      
   The movement of long to int will 'smash' out extraneous significance.   
   As written: b has range [-2G..+2G] and the register holding b's value   
   will too.   
      
   The important property is that registers contain 64-bits and the value   
   in the register is range-limited to the calculated (or LDed) result.   
      
   > With the extension in the operands, you do not need any extension   
   > instructions, not even for division, right-shift etc.   
   >   
   > The question, however, is if the extensions occur often enough to   
   > merit such features. I lean towards the SPARC/PowerPC/My 66000-v1   
   > approach here.   
      
   I did too, until conversations with LLVM compiler writer.   
   GNUPLOT seems to be a banner application wrt range-limited calcu-   
   lations.   
      
   > >> It would be interesting to see how many sign-extensions and   
   > >> zero-extensions (whether explicit or implicitly part of the   
   > >> instruction) are executed in code that is generated from various C   
   > >> sources (with and without -fwrapv).   
   > >   
   > >In GNUPLOT is is just over 4% of instruction count for 64-bit-only   
   > >integer calculations.   
   >   
   > Now what if you had a calling convention with garbage-extension? A   
   > number of extensions in your examples would go away.   
      
   Not many, few are on ABI and most of the ones that are are dealt with   
   when moving arguments to preserved registers. So, you could send HoBs   
   that are never observed since the MOV Rpreserved,Rargument gets changed   
   into a SR[AL] Rpreserved,Rargument<32:0> at no space or time cost.   
      
   > >Counted for() loops are somewhat special in that it is quite easy to   
   > >determine that the loop index never exceeds the range-limit of the   
   > >container.   
   >   
   > There have been enough cases where such reasoning led to "optimizing"   
   > code into an infinite loop and other fallout of adversarial compilers.   
   >   
   > >> If n is unsigned, you can also choose unsigned,   
   > >> but then this code will be slow on RV64 (and MIPS64 and SPARC V9 and   
   > >> PowerPC64 and Alpha).   
   > >   
   > >Example please !?!   
   >   
   > With a slightly different loop:   
   >   
   > long foo(long a[], unsigned l, unsigned h)   
   > {   
   > unsigned i; // <---this variable should be uint64_t   
   > long r=0;   
   > for (i=l; i!=h; i++)   
   > r+=a[i];   
   > return r;   
   > }   
   >   
   > gcc-10 -O3 produces on RV64G:   
   >   
   > 0000000000000000 :   
   > 0: 872a mv a4,a0   
   > 2: 4501 li a0,0   
   > 4: 00c58c63 beq a1,a2,1c <.L4>   
   >   
   > 0000000000000008 <.L3>:   
   > 8: 02059793 slli a5,a1,0x20 // eliminate HoBs   
   > c: 83f5 srli a5,a5,0x1d // does not have scaled   
   indexing   
   > e: 97ba add a5,a5,a4 // does not have indexing   
   > 10: 639c ld a5,0(a5) // all that work   
   > 12: 2585 addiw a1,a1,1   
   > 14: 953e add a0,a0,a5 // loop induction   
   > 16: feb619e3 bne a2,a1,8 <.L3>   
   > 1a: 8082 ret   
   >   
   > 000000000000001c <.L4>:   
   > 1c: 8082 ret   
   >   
   foo:   
    MOV R4,#0   
    MOV R5,#1   
    VEC R7,{}   
    LDD R6,[R1,R5<<3]   
    ADD R4,R4,R6   
    LOOP2 NE,R5,#1,R3   
    MOV R1,R4   
    RET   
   >   
   >   
   > >   
   > >> If n is int, you can also choose int, and there is actually enough   
   > >> information here to make the code efficient (even with -fwrapv),   
   > >> because in this code int overflow really cannot happen,   
   > >   
   > >Consider the case where n is int64_t or uint64_t !?!   
   >   
   > Then the first condition does not hold on I32LP64.   
   >   
   > >Consider the C-preprocessor with::   
   > ># define int (short int) // !!   
   > >in scope.   
   >   
   > Then the compiler will see short int, and generate code accordingly.   
   > What's your point?   
   >   
   > - anton   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|