From: user5857@newsgrouper.org.invalid   
      
   anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
      
   > MitchAlsup writes:   
   > >   
   > >anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   > >> Intel designed SSE with scalar instructions that use only 32 bits out   
   > >> of the 128 bits available; SSE2 with 64-bit scalar instructions, AVX   
   > >> (and AVX2) with 32-bit and 64-bit scalar operations in a 256-bit   
   > >> register, and various AVX-512 variants with 32-bit and 64-bit scalars,   
   > >> and 128-bit and 256-bit operations in addition to the 512-bit ones.   
   > >> They are obviously not worried about waste.   
   > >   
   > >Which only goes to prove that x86 is not RISC.   
   >   
   > I don't see that following at all, but it inspired a closer look at   
   > the usage/waste of register bits in RISCs:   
   >   
   > Every 64-bit RISC starting with MIPS-IV and Alpha, wastes a lot of   
   > precious register bits by keeping 8-bit, 16-bit, and 32-bit values in   
   > 64-bit registers rather than following the idea of Intel and Robert   
   > Finch of splitting the 64-bit register in the double number of 32-bit   
   > registers; this idea can be extended to eliminate waste by having the   
   > quadruple number of 16-bit registers that can be joined into 32-bit   
   > anbd 64-bit registers when needed, or even better, the octuple number   
   > of 8-bit registers that can be joined to 16-bit, 32-bit, and 64-bit   
   > registers. We can even ressurrect the character-oriented or   
   > digit-oriented architectures of the 1950s.   
      
   Consider that being able to address every 2^(3+n) field of a register   
   is far from free. Take a simple add of 2 bytes::   
      
    ADDB R8[7], R6[3], R19[4]   
      
   One has to individually align each of the bytes, which is going to blow   
   out your timing for forwarding by at least 3 gates of delay (operands)   
   and 4 gates for the result (register). The only way it makes "timing"   
   sense if if you restrict the patterns to::   
      
    ADDB R8[7], R6[7], R19[7]   
      
   Where there is no "vertical" routine in obtaining operands and delivering   
   results. {{OR you could always just eat a latency cycle when all fields   
   are not the same.}}   
      
   I also suspect that you would gain few compiler writers to support random   
   fields in registers.   
      
   > Intel split AX into AL and AH, similar for BX, CX, and DX, but not for   
   > SI, DI, BP, and SP.   
      
   {ABCD}X registers were data.   
   {SDBS} registers were pointer registers.   
      
   There are vanishingly few useful manipulations on part of pointers.   
      
   Oh and BTW:: using x86-history as justification for an architectural   
   feature is "bad style".   
      
   > In the 32-bit extension, they did not add ways to   
   > access the third and fourth byte, or the second wyde (16-bit value).   
   > In the 64-bit extension, AMD added ways to access the low byte of   
   > every register (in addition to AH-DH), but no way to access the second   
   > byte of other registers than RAX-RDX, nor ways to access higher wydes,   
   > or 32-bit units. Apparently they were not concerned about this kind   
   > of waste. For the 8086 the explanation is not trying to avoid waste,   
   > but an easy automatic mapping from 8080 code to 8086 code.   
   >   
   > Writing to AL-DL or AX-DX,SI,DI,BP,SP leaves the other bits of the   
   > 32-bit register alone, which one can consider to be useful for storing   
   > data in those bits (and in case of AL, AH actually provides a   
   > conventient way to access some of the bits, and vice versa), but leads   
   > to partial-register stalls. The hardware contains fast paths for some   
   > common cases of partial-register writes, but AFAIK AH-DH do not get   
   > fast paths in most CPUs.   
   >   
   > By contrast, RISCs waste the other 24 of 56 bits on a byte load by   
   > zero-extending or sign-extending the byte.   
      
   But gains the property that the whole register contains 1 proper value   
   {range-limited to the container size whence it came} This in turn makes   
   tracking values easy--in fact placing several different sized values   
   in a single register makes it essentially impossible to perform value   
   analysis in the compiler.   
      
   > Alpha avoids wasting register bits for some idioms by keeping up to 8   
   > bytes in a register in SIMD style (a few years before the wave of SIMD   
   > extensions across the industry), but still provides no direct name for   
   > the individual bytes of a register.   
      
   If your ISA has excellent support for statically positioned bit-fields   
   (or even better with dynamically positioned bit fields) fetching the   
   fields and depositing them back into containers does not add significant   
   latency. {volatile notwithstanding} While poor ISA support does add   
   significant latency.   
      
   > IIRC the original HPPA has 32 or so 64-bit FP registers, which they   
   > then split into 58? 32-bit FP registers. I don't know how they   
   > further evolved that feature.   
   >   
   > - anton   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|