From: user5857@newsgrouper.org.invalid   
      
   Stefan Monnier posted:   
      
   > >> I have come to realize that 32/64 is probably better than 16/32 here,   
   > >> primarily in terms of performance, but also helps with code-density (a   
   > >> pure 32/64 encoding scheme can beat 16/32 in terms of code-density   
   > >> despite only having larger instructions available).   
   > >   
   > > My 66000 does not even bother with 16-bit instructions--and still ends   
   > > up requiring fewer instruction count than RISC-V. {32, 64, 96, 128, 160}   
   > > are the instruction sizes; with no instructions ever requiring constants   
   > > to be assembled.   
   >   
   > Indeed, My 66000 aims for "fat" instructions so as to try and reduce   
   > instruction counts. That should hopefully result in an efficient ISA:   
   > fewer instructions should cost less runtime resources (as long as they   
   > don't get split into more μops).   
   >   
   > > Most of the MOV instructions in My 66000 are found::   
   > > a) before a call--moving values to argument positions,   
   > > b) after a call--moving results to post-call positions,   
   > > c) around loops --moving values for next loop iteration.   
   > [...]   
   > > I suspect that argument setup before and result take-down after call   
   > > would have quite a bit of parallelism. I suspect that moving fields   
   > > around for the next loop iteration would have significant parallelism.   
   >   
   > Are you saying that you expect the efficiency of My 66000 could be   
   > improved by adding some way to express those moves in a better way?   
      
   Probably, yes; I just never found a way to do it (yet).   
      
   > A key element of the Mill is/was its ability to "permute" its belt   
   > elements in a single cycle. I still don't fully understand how this is   
   > encoded in the ISA and implemented in hardware, but it sounds like   
   > you're hinting in the same direction: some kind of "parallel move"   
   > instruction with many inputs and many outputs.   
      
   For argument setup (calling side) one needs MOV {R1..R5},{Rm,Rn,Rj,Rk,Rl}   
   For returning values (calling side) needs MOV {Rm,Rn,Rj},{R1..R3}   
      
   For loop iterations needs MOV {Rm,Rn,Rj},{Ra,Rb,Rc}   
      
   I just can't see how to make these run reasonably fast within the   
   constraints of the GBOoO Data Path.   
   >   
   >   
   > Stefan   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|