... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 131,005 of 131,241
Paul Clayton to MitchAlsup
Re: branch splitting
08 Feb 26 10:24:54
   From: paaronclayton@gmail.com   
      
   On 11/5/25 3:43 PM, MitchAlsup wrote:   
   [snip]   
   > I am now working on predictors for a 6-wide My 66000 machine--which is a bit   
   > different.   
   > a) VEC-LOOP loops do not alter the branch prediction tables.   
   > b) Predication clauses do not alter the BPTs.   
      
   Not recording the history of predicates may have a negative   
   effect on global history predictors. (I do not know if anyone   
   has studied this, but it has been mentioned — e.g.,   
   "[predication] has a negative side-effect because the removal   
   of branches eliminates useful correlation information   
   necessary for conventional branch predictors" from "Improving   
   Branch Prediction and Predicated Execution in Out-of-Order   
   Processors", Eduardo Quiñones et al., 2007.)   
      
   Predicate prediction can also be useful when the availability   
   of the predicate is delayed. Similarly, selective eager   
   execution might be worthwhile when the predicate is delayed;   
   the selection is likely to be predictive (resource use might   
   be a basis for selection but even estimating that might be   
   predictive).   
      
   With predication of all short forward branches in order to   
   avoid fetch bubbles, the impact of delayed predicate   
   availability and missing information for branch prediction   
   may be greater than for more selective predication.   
      
   There may also be some short forward branches that are 99%   
   taken such that converting to a longer branch with a jump back   
   may be a better option. With trace-cache-like optimization,   
   such branched over code could be removed from fetch even when   
   the compiler used a short branch. Dynamic code organization   
   has the advantage of being able to use dynamically available   
   information (and the disadvantage of gathering the information   
   and making a decision dynamically).   
      
   Something like a branch target cache could store extracted   
   instructions. This might facilitate stitching such   
   instructions back into the instruction stream with limited   
   overhead. Since this would only work for usually taken   
   hammock branches, it would probably not be worthwhile. For   
   if-then-else constructs, one might place both paths in   
   separate entries in such a target cache and always stitch   
   in one of them, but that seems wonky.   
      
   I rather doubt the benefits of such would justify the added   
   complexity — almost certainly not in a first or second   
   implementation of an architecture — but I would not want to   
   reject future possible adoption of such techniques.   
      
   My guess would be that most short forward branches would not   
   use an extracted code cache either because they are generally   
   not taken so there is no fetch advantage or because the branch   
   direction in unpredictable such that predication likely makes   
   more sense and fetching from two structures just adds   
   complexity.   
      
   For highly unlikely code, an extracted cache might have   
   higher latency (from placement, from deferred access to use   
   better prediction, or from more complex retrieval). Stalling   
   renaming when a longer-latency insertion is predicted seems   
   undesirable (though it may have negligible performance harm),   
   but including just enough dataflow information in the quicker   
   accessed caches to support out-of-order fetch seems   
   complicated.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]