Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 129,588 of 131,241    |
|    BGB to Stephen Fuld    |
|    Re: Concedtina III May Be Returning    |
|    05 Sep 25 02:16:17    |
   
   From: cr88192@gmail.com   
      
   On 9/4/2025 11:41 PM, Stephen Fuld wrote:   
   > On 9/4/2025 7:53 PM, BGB wrote:   
   >> On 9/4/2025 3:20 PM, Stephen Fuld wrote:   
   >>> On 9/4/2025 12:06 PM, EricP wrote:   
   >>>> Stephen Fuld wrote:   
   >>>>> On 9/4/2025 10:19 AM, EricP wrote:   
   >>>>>> BGB wrote:   
   >>>>>>> On 9/3/2025 9:42 PM, EricP wrote:   
   >>>>>>>> MitchAlsup wrote:   
   >>>>>>>>>   
   >>>>>>>>> However, I also found that STs need an immediate and a   
   >>>>>>>>> displacement, so,   
   >>>>>>>>> Major == 0b'001001 and minor == 0b'011xxx has 4 ST instructions   
   >>>>>>>>> with   
   >>>>>>>>> potential displacement (from D12ds above) and the immediate has   
   >>>>>>>>> the size of the ST. This provides for::   
   >>>>>>>>> std #4607182418800017408,[r3,r2<<3,96]   
   >>>>>>>>   
   >>>>>>>> Compare and Branch can also use two immediates as it   
   >>>>>>>> has reg-reg or reg-imm compares plus displacement.   
   >>>>>>>> And has high enough frequency to be worth considering.   
   >>>>>>>>   
   >>>>>>>   
   >>>>>>> Can be done, yes.   
   >>>>>>> High enough frequency/etc, is where the possible debate lies.   
   >>>>>>>   
   >>>>>>>   
   >>>>>>> Checking stats, it can effect roughly 1.9% of the instructions.   
   >>>>>>> Or, around 11% of branches; most of the rest being unconditional   
   >>>>>>> or comparing against 0 (which can use the Zero Register). Only a   
   >>>>>>> relative minority being compares against non-zero constants.   
   >>>>>>   
   >>>>>> The only instruction usage stats I have are from those VAX papers:   
   >>>>>> A Case Study of VAX-11 Instruction Set Usage For Compiler   
   >>>>>> Execution, 1982   
   >>>>>>   
   >>>>>> That shows about 12% instructions are conditional branch and 9% CMP.   
   >>>>>> That says to me that almost all Bcc are paired with a CMP,   
   >>>>>> and very few use the flags set as a side effect of ALU ops.   
   >>>>>   
   >>>>> OK, but does this tell you how many of the CMPs are to a value of   
   >>>>> zero? I expect these to be a significant enough percentage to skew   
   >>>>> your analysis.   
   >>>>   
   >>>> Looking at   
   >>>> Measurement and Analysis of Instruction Use in VAX 780, 1982   
   >>>>   
   >>>> VAX had a TST instruction which was the same as CMP src,#0.   
   >>>> TST has < 2% usage while CMP 10-12%.   
   >>>   
   >>> Thanks. That's interesting. So perhaps ~15% of all compares are to   
   >>> zero. I would have expected higher.   
   >>>   
   >>   
   >> Looking at some stats generated by my compiler (for branches):   
   >> 61% of branches are unconditional   
   >> 15% are comparing to 0   
   >> 13% are comparing two registers   
   >> 11% are comparing to some other non-zero constant.   
   >>   
   >   
   > So ~39% of branches are conditional, and 15% compare to zero. So   
   > (15/39) ~38% of conditional branches are comparing to zero. That is   
   > more in line with what I had expected.   
   >   
      
   Yes.   
      
   Basically, it also means:   
   72% of conditional branches are already addressed by the existing "Bcc   
   2R, Disp" instructions (in RISC-V or similar).   
      
      
      
      
   28% of conditional branches could "maybe" use a combined Bcc-Imm   
   instruction (say, if added to RV or similar).   
      
   But then, the issue is how to best encode it:   
   64-bit encoding:   
    Kinda meh;   
    Doesn't help anything with code density;   
    Only maybe helps with performance (if branch-predictor supports it);   
    Delta is small.   
   32 bit encoding options:   
    Burn one of the User blocks;   
    Huawei went this way.   
    But, too much encoding space.   
    Shrink the displacement from 12 to 9 bits.   
    Would have 3 bits for compare operator.   
    This would have been my choice.   
    Does mean a new reloc type and decode/branch-predictor annoyance.   
    Most short loops/etc smaller than 512 bytes.   
    Do a 32-bit op with a 3 bit register field.   
    Very poor option IMHO.   
    Do a 32-bit op with a 3 bit immediate field.   
    Wasn't considered at the time, but wouldn't be terribly useful.   
    Only do BEQ and BNE.   
    Kinda meh (doesn't cover much, but at least usable).   
      
   They went with "BEQI/BNEI Rs1, Imm5, Label", which is, kinda meh.   
    Personally I would have assumed using these spots for BTST/BNTST.   
    But, the TST operator has no precedent in RISC-V.   
      
   But, as I see it, the rough ranking of comparison operators seems to be:   
    BEQ/BNE   
    BLT/BGE   
    BTST/BNTST (N/A in RISC-V, but exists in XG2 and XG3)   
    BLTU/BGEU   
      
   Where, likely, a BTST/BNTST would have had a slightly higher average hit   
   rate than a BEQI/BNEI. But, when I looked at it before, it seemed like   
   it was pretty close either way, and of these, adding ((A&B)==0) is more   
   likely to have had a higher logic cost.   
      
      
   Though, ironically, BTST and BNTST are the more likely ops to use an   
   immediate...   
    Mostly as:   
    "if(x&0x10) { ... }"   
    Being a fairly common idiom.   
      
   Though, for larger masks:   
    if((x>>47)&1) { ... }   
   Typically being slightly preferable to:   
    if(x&0x0000800000000000ULL) { ... }   
   Mostly as on-average the shift is cheaper than the large constant   
   (particularly on RISC-V), though little says a compiler can't turn the   
   latter into the former.   
      
      
   Though, I am left to suspect that BEQI/BNEI might not have been the best   
   choice, as while BEQ/BNE are the two most common cases, they are also   
   the most likely to compare against X0. If one eliminates BEQ and BNE   
   where one of the operands is 0, then BLT/BGE would move into first place   
   in the rankings.   
      
      
   I had just went with a 64-bit encoding and kinda went "meh" as there was   
   seemingly no good way to make it particularly compelling.   
      
   So, had usually preferred to focus more on features which have a more   
   obvious benefit.   
      
      
   >   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca