From: user5857@newsgrouper.org.invalid   
      
   EricP posted:   
      
   > BGB wrote:   
   > > On 9/3/2025 9:42 PM, EricP wrote:   
   > >> MitchAlsup wrote:   
   > >>>   
   > >>> However, I also found that STs need an immediate and a displacement, so,   
   > >>> Major == 0b'001001 and minor == 0b'011xxx has 4 ST instructions with   
   > >>> potential displacement (from D12ds above) and the immediate has the   
   > >>> size of the ST. This provides for::   
   > >>> std #4607182418800017408,[r3,r2<<3,96]   
   > >>   
   > >> Compare and Branch can also use two immediates as it   
   > >> has reg-reg or reg-imm compares plus displacement.   
   > >> And has high enough frequency to be worth considering.   
   > >>   
   > >   
   > > Can be done, yes.   
   > > High enough frequency/etc, is where the possible debate lies.   
   > >   
   > >   
   > > Checking stats, it can effect roughly 1.9% of the instructions.   
   > > Or, around 11% of branches; most of the rest being unconditional or   
   > > comparing against 0 (which can use the Zero Register). Only a relative   
   > > minority being compares against non-zero constants.   
   >   
   > The only instruction usage stats I have are from those VAX papers:   
   > A Case Study of VAX-11 Instruction Set Usage For Compiler Execution, 1982   
   >   
   > That shows about 12% instructions are conditional branch and 9% CMP.   
   > That says to me that almost all Bcc are paired with a CMP,   
   > and very few use the flags set as a side effect of ALU ops.   
      
   About 25% = (12%-9%)/12% use ALU CCs.   
      
   > I would expect those two numbers to be closer as even today compilers don't   
   > know about those side effect flags and will always emit a CMP or TST first.   
   > Possibly those VAX that Bcc using ALU side effect flags were assembler.   
      
   VAX had "more regular" settings of ALU CCs than typical CISCs.   
   This regularity made it easier for the compiler to track.   
      
   On the other hand:: a gain of 25%*12% = 4% would not have allowed CCs   
   to "make the cut" for RISC ISA designs.   
      
   > > One could argue:   
   > > This is high enough to care,   
      
   boarder line   
      
   > > but is it cheap enough?...   
      
   not for me as it causes RoB/RETIRE to do a lot more work.   
   It does require Forwarding to do more work;   
   It may also cause DECODE to do more work.   
      
   > The instruction fetch buffer has to be larger as the worst case size   
   > just got larger. And there are more format variations so the Parser   
   > gets more complex. And Decode has to pick apart the two immediates   
   > and place them in different fields so more muxes.   
   >   
   > Each front end uOp lane would have two immediate fields, one for an   
   > integer or float data value up to 8 bytes, one for up to 8 byte offset.   
   > Then at Dispatch (hand-off to the back end) muxes to route each   
   > immediate onto the FU operand bus.   
   >   
   > The difference comes in the back end Reservation Stations.   
   > If they are valued RS then the immediates are held just like   
   > any other operand values that were ready at time of Dispatch.   
      
   In My designs, the value-capturing RS does not need the ST.data value   
   until after it has write permission, so instead of capturing this in   
   the RS early, I put a reservation on the location in IB, and move the   
   immediate to RS after AGEN (and before RETIRE). This prevents excess   
   RS operand capture flip-flops.   
      
   > The number of operands doesn't change so no extra cost here.   
      
   If you capture ST.data early it does.   
      
   > But if they are valueless RS then it has no place to hold those   
   > immediates so it needs some place to stash them until the uOp launches.   
      
   Here, the IB is the obvious place to store them until use.   
      
   > In that case it might be better if Decode took all the immediates and   
   > stash them in a circular buffer and just passed the indexes in the uOp.   
   > Then at launch the FU would pull in the immediates   
   > just like it pulls in the register operand values.   
   > This gets rid of the extra front end costs.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|