From: robfi680@gmail.com   
      
   On 2025-12-20 6:14 p.m., MitchAlsup wrote:   
   >   
   > Stefan Monnier posted:   
   >   
   >>> For argument setup (calling side) one needs MOV {R1..R5},{Rm,Rn,Rj,Rk,Rl}   
   >>> For returning values (calling side) needs MOV {Rm,Rn,Rj},{R1..R3}   
   >>   
   >> In terms of encoding, these are fairly easy and could each fit within   
   >> a 32bit instruction.   
   >   
   > You are going to put 6×5-bit fields in a single 32-bit instruction with   
   > a 6-bit Major OpCode ?!?! I would like to see it done. Remember: all   
   > specifiers are in the first 32-bits of the "instruction" only constants   
   > are used as Variable Length.   
   >   
   >>> For loop iterations needs MOV {Rm,Rn,Rj},{Ra,Rb,Rc}   
   >>   
   >> IIUC these could have any number of registers and the destination and   
   >> source regs can be "anything", so the encoding would take up more space.   
   >> Arguably it might be possible in many/most cases to arrange for   
   >> {Rm,Rn,Rj} to be {R1..Rn}, so it might be able to use the same   
   >> instruction as the call-setup.   
   >   
   > In principle I buy this argument:: in practice I can't see it happening.   
   >   
   > I can see an encoding that would provide a "bunch of MOVs/Renames"   
   > but only if I disobey a principle tenet of ISA encoding {One that RISC-V   
   > threw away on day 1} and that is; the register specification fields are   
   > at fixed locations. It is this tenet that removed some    
   > logic before multiplexing the specifiers into the RF decoder. The fixed   
   > position argument has neither the logic nor the multiplexer, RF specifiers   
   > are wired directly to the RF/Renamer decoder ports directly.   
   >   
   >>> I just can't see how to make these run reasonably fast within the   
   >>> constraints of the GBOoO Data Path.   
   >>   
   >> Hmm... One would hope this can be handled entirely in the renamer   
   >> without touching the actual data path, but ... sorry: if you don't know   
   >> how to do it, I sure don't either.   
   >   
   > Once one goes beyond the 3-operand 1-result property, all sorts of little   
   > things start to break--like multiplexing the RF specifiers. The Data-Path   
   > and the Register/Renamer ports are all designed to this FMAC requirement,   
   > giving us CMOV and INSert instructions with reasonable encodings.   
   >   
   > Right now, there are no register specifiers in the variable length part   
   > of ISA--just constants.   
   >   
   > It is also not exactly clear how one "makes" an instruction with   
   {2,3,4,5,6,7}   
   > writes traverse the pipeline smoothly. I took serious consideration to find   
   > an smooth solution to even {2} results, and for this I built an accumulator   
   > attached to the 3-operand+1-result function units where the added operand is   
   > read once (if needed) and written once (if needed) often not requiring ANY   
   > RF activity in support of the CARRY variable itself.   
   >   
   >>   
   >> Stefan   
      
   I tentatively added such an instruction to MOVE {Ra,Rb,RC},{Rx,Ry,Rz}   
   using the micro-op translator. (Qupls4 has 48-bits to work with). But it   
   may be too slow, I have to see what shows up on the timing path. It is   
   busted into zero to three micro-ops so not any faster, but it is more   
   code dense.   
      
   I am relying on the micro-ops to have a consistent format fed to the   
   renamer. The ISA instructions may differ slightly.   
      
   Having fun with the dispatcher. I had it as an out-of-order unit, when   
   it should really be part of the in-order pipeline to reduce the size.   
   Handling the dispatch OoO was easier as there may not be enough units   
   available to dispatch to. OoO dispatch did not need to worry about   
   stalling the pipeline. Switching it to in-order cut the size down to ¼   
   what it was though.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|