From: user5857@newsgrouper.org.invalid   
      
   BGB posted:   
      
   > On 11/4/2025 3:44 PM, Terje Mathisen wrote:   
   > > MitchAlsup wrote:   
   ---------------   
   > >   
   > > As you said: "Never bet against branch prediction".   
   > >   
   >   
   > Branch prediction is fun.   
   >   
   >   
   > When I looked around online before, a lot of stuff about branch   
   > prediction was talking about fairly large and convoluted schemes for the   
   > branch predictors.   
   >   
   > But, then always at the end of it using 2-bit saturating counters:   
   > weakly taken, weakly not-taken, strongly taken, strongly not taken.   
   >   
   > But, in my fiddling, there was seemingly a simple but moderately   
   > effective strategy:   
   > Keep a local history of taken/not-taken;   
   > XOR this with the low-order-bits of PC for the table index;   
   > Use a 5/6-bit finite-state-machine or similar.   
   > Can model repeating patterns up to ~ 4 bits.   
   >   
   > Where, the idea was that the state-machine in updated with the current   
   > state and branch direction, giving the next state and next predicted   
   > branch direction (for this state).   
   >   
   >   
   > Could model slightly more complex patterns than the 2-bit saturating   
   > counters, but it is sort of a partial mystery why (for mainstream   
   > processors) more complex lookup schemes and 2 bit state, was preferable   
   > to a simpler lookup scheme and 5-bit state.   
      
   In 1991 Mike Shebanow, Tse-Yu Yeh, and I tried out a Correlation predictor   
   where strings of {T, !T}** were pattern matched to create a prediction.   
   While it was somewhat competitive with Global History Table, it ultimately   
   failed.   
      
   I am now working on predictors for a 6-wide My 66000 machine--which is a bit   
   different.   
   a) VEC-LOOP loops do not alter the branch prediction tables.   
   b) Predication clauses do not alter the BPTs.   
   c) Jump Through Table is not predicted through jump indirect table-like   
    prediction, what is predicted is the value (switch variable) and this   
    is used to index the table (early)   
   d) CMOV gets rid of another 8%   
      
   These strip out about 40% of branches from needing prediction, causing   
   the remaining branches to be harder to predict but having less total   
   latency in execution.   
      
   -----------------   
   > Not proven, but I suspect that an arbitrary 5 bit pattern within a 6 bit   
   > state might be impossible. Although there would be sufficient   
   > state-space for the looping 5-bit patterns, there may not be sufficient   
   > state-space to distinguish whether to move from a mismatched 4-bit   
   > pattern to a 3 or 5 bit pattern. Whereas, at least with 4-bit, any   
   > mismatch of the 4-bit pattern can always decay to a 3-bit pattern, etc.   
   > One needs to be able to express decay both to shorter patterns and to   
   > longer patterns, and I suspect at this point, the pattern breaks down   
   > (but can't easily confirm; it is either this or the pattern extends   
   > indefinitely, I don't know...).   
      
   Tried some of these (1991) mostly with little to no success.   
   Be my guest and try again.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|