From: cr88192@gmail.com   
      
   On 9/5/2025 2:09 PM, MitchAlsup wrote:   
   >   
   > EricP posted:   
   >   
   >> BGB wrote:   
   >>> On 9/3/2025 9:42 PM, EricP wrote:   
   >>>> MitchAlsup wrote:   
   >>>>>   
   >>>>> However, I also found that STs need an immediate and a displacement, so,   
   >>>>> Major == 0b'001001 and minor == 0b'011xxx has 4 ST instructions with   
   >>>>> potential displacement (from D12ds above) and the immediate has the   
   >>>>> size of the ST. This provides for::   
   >>>>> std #4607182418800017408,[r3,r2<<3,96]   
   >>>>   
   >>>> Compare and Branch can also use two immediates as it   
   >>>> has reg-reg or reg-imm compares plus displacement.   
   >>>> And has high enough frequency to be worth considering.   
   >>>>   
   >>>   
   >>> Can be done, yes.   
   >>> High enough frequency/etc, is where the possible debate lies.   
   >>>   
   >>>   
   >>> Checking stats, it can effect roughly 1.9% of the instructions.   
   >>> Or, around 11% of branches; most of the rest being unconditional or   
   >>> comparing against 0 (which can use the Zero Register). Only a relative   
   >>> minority being compares against non-zero constants.   
   >>   
   >> The only instruction usage stats I have are from those VAX papers:   
   >> A Case Study of VAX-11 Instruction Set Usage For Compiler Execution, 1982   
   >>   
   >> That shows about 12% instructions are conditional branch and 9% CMP.   
   >> That says to me that almost all Bcc are paired with a CMP,   
   >> and very few use the flags set as a side effect of ALU ops.   
   >   
   > About 25% = (12%-9%)/12% use ALU CCs.   
   >   
   >> I would expect those two numbers to be closer as even today compilers don't   
   >> know about those side effect flags and will always emit a CMP or TST first.   
   >> Possibly those VAX that Bcc using ALU side effect flags were assembler.   
   >   
   > VAX had "more regular" settings of ALU CCs than typical CISCs.   
   > This regularity made it easier for the compiler to track.   
   >   
   > On the other hand:: a gain of 25%*12% = 4% would not have allowed CCs   
   > to "make the cut" for RISC ISA designs.   
   >   
      
   Several major RISCs still had them though:   
    ARM   
    POWER / PowerPC   
    ...   
      
      
   Generalized ALU CC's kinda suck though.   
      
   In their "best case" they are kinda neutral.   
   Want to write an emulator, or want to make the CPU superscalar, and CC's   
   will kinda ruin the day.   
      
   The 1-bit T/F status flag was at least "less bad" than full CC's:   
   Only modified by certain instructions, for which modifying the flag was   
   usually their primary purpose;   
   Since only modified infrequently, and only used as inputs to certain   
   classes of instructions (such as those which have been marked as   
   conditional), it is less of an issue to manage it in the pipeline.   
      
      
   >>> One could argue:   
   >>> This is high enough to care,   
   >   
   > boarder line   
   >   
   >>> but is it cheap enough?...   
   >   
   > not for me as it causes RoB/RETIRE to do a lot more work.   
   > It does require Forwarding to do more work;   
   > It may also cause DECODE to do more work.   
   >   
      
   I was originally writing this in the context of Branch-with-Immediate   
   instructions, which AFAIK/IIRC My66000 already has in some form?...   
      
      
   Some RISC-V people had already been considering this, and some RISC-V   
   SoC's apparently already have a variant of this (as custom extensions).   
      
      
   I am not entirely sure where the stuff related to CC's came from, but alas.   
      
   I was not arguing for having CC's in any case.   
      
      
      
   But, yeah, as noted, the main added costs for Branch-with-Immediate and   
   Store-with-Immediate are:   
   Decoder needs to produce a second immediate output;   
   It needs to be routed to Lane3 or similar;   
   We need another pseudo-register, and the logic to fetch an immediate   
   from Lane3.   
      
   Where, as noted:   
   In the BJX2 Core, the handling of immediate values is mostly done by   
   using pseudo registers.   
      
   A few examples of pseudo registers:   
    ZZR : Zero   
    IMM : Returns immediate associated with current lane.   
    JIMM: Returns immediate from gluing the Lane 1/2 immediate together;   
    PC : Returns PC of following instruction   
    BPC : Returns PC of current instruction   
    IMMB: Returns immediate from Lane 3   
    ...   
      
   This means I don't need separate logic internally for Register and   
   Immediate forms of instructions, as all instructions can be implemented   
   as register forms. Partial exception is a few cases like the FPU   
   assuming that the Immediate field is used to route the current rounding   
   mode and similar.   
      
      
      
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|