From: anton@mips.complang.tuwien.ac.at   
      
   MitchAlsup writes:   
   >   
   >MitchAlsup posted:   
   >   
   >>   
   >> anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >> > #include    
   >> >   
   >> > long arrays(long *v, size_t n)   
   >> > {   
   >> > long i, r;   
   >> > for (i=0, r=0; i> > r+=v[i];   
   >> > return r;   
   >> > }   
   >> >   
   >> > long a, b, c, d;   
   >> >   
   >> > void globals(void)   
   >> > {   
   >> > a = 0x1234567890abcdefL;   
   >> > b = 0xcdef1234567890abL;   
   >> > c = 0x567890abcdef1234L;   
   >> > d = 0x5678901234abcdefL;   
   >> > }   
   >>   
   >> > So, the overall sizes (including data size for globals() on RV64GC) are:   
   >> > Bytes Instructions   
   >> > arrays globals Architecture arrays globals   
   >> > 28 66 (34+32) RV64GC 12 9   
   >> > 27 69 AMD64 11 9   
   >> > 44 84 ARM A64 11 22   
   >> 32 68 My 66000 8 5   
   >   
   >In light of the above, what do people think is more important, small   
   >code size or fewer instructions ??   
      
   Performance from a given chip area.   
      
   The RISC-V people argue that they can combine instructions with a few   
   transistors. But, OTOH, they have 16-bit and 32-bit wide   
   instructions, which means that a part of the decoder results will be   
   thrown away, increasing the decode cost for a given number of average   
   decoded instructions per cycle. Plus, they need more decoded   
   instructions per cycle for a given amount of performance.   
      
   Intel and AMD demonstrate that you can get high performance even with   
   an instruction set that is even worse for decoding, but that's not cheap.   
      
   ARM A64 goes the other way: Fixed-width instructions ensure that all   
   decoding on correctly predicted paths is actually useful.   
      
   However, it pays for this in other ways: Instructions like load pair   
   with auto-increment need to write 3 registers, and the write port   
   arbitration certainly has a hardware cost. However, such an   
   instruction would need two loads and an add if expressed in RISC-V; if   
   RISC-V combines these instructions, it has the same write-port   
   arbitration problem. If it does not combine at least the loads, it   
   will tend to perform worse with the same number of load/store units.   
      
   So it's a balancing game: If you lose some weight here, do you need to   
   add the same, more, or less weight elsewhere to compensate for the   
   effects elsewhere?   
      
   >At some scale, smaller code size is beneficial, but once the implementation   
   >has a GBOoO µarchitecture, I would think that fewer instructions is better   
   >than smaller code--so long as the code size is less than 150% of the smaller   
   >AND so long as the ISA does not resort to sequential decode (i.e., VAX).   
      
   I don't think that even VAX encoding would be the major problem of the   
   VAX these days. There are microop caches and speculative decoders for   
   that (although, as EricP points out, the VAX is an especially   
   expensive nut to crack for a speculative decoder).   
      
   In any case, if smaller code size was it, RV64GC would win according   
   to my results. However, compilers often generate code that has a   
   bigger code size rather than a smaller one (loop unrolling, inlining),   
   so code size is not that important in the eyes of the maintainers of   
   these compilers.   
      
   I also often see code produced with more (dynamic) instructions than   
   necessary. So the number of instructions is apparently not that   
   important, either.   
      
   - anton   
   --   
   'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'   
    Mitch Alsup,    
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|