From: user5857@newsgrouper.org.invalid   
      
   Paul Clayton posted:   
      
   > On 2/5/26 4:27 PM, MitchAlsup wrote:   
   > >   
   > > MitchAlsup posted:   
   > >   
   > >> Paul Clayton posted:   
   >   
   > [snip]   
   > >>> LL-op-SC could be recognized as an idiom and avoid bringing data   
   > >>> to the core.   
   > >>   
   > >> Can recognize:   
   > >>   
   > >> LDL Rd,[address]   
   > >> ADD Rd,Rd,#whatever   
   > >> STC Rd,[address]   
   > >>   
   > >> Cannot recognize:   
   > >>   
   > >> LDA R1,[address]   
   > >> CALL LoadLocked   
   > >> ADD R2,R2,#whatever   
   > >> CALL StoreConditional   
   >   
   > When would one want to decouple LL and SC into function calls   
   > away from the computation? Perhaps for in-place software   
   > instrumenation?   
      
   Write, in pure K&R C, the functionality for LoadLocked and   
   StoreConditional.   
      
   > >>>> Atomic-to-Memory HAS to be done outside of THIS-CPU or it is not   
   > >>>> Atomic-to-Memory. {{Thus it deserves its own instruction or prefix}}   
   > >>>   
   > >>> I wonder if there is an issue of communicating intention to the   
   > >>> computer. Using atomic-to-memory may be intended to communicate   
   > >>> that the operation is expected to be under contention or that   
   > >>> moderating the impact under high contention is more important   
   > >>> than having a fast "happy path".   
   > >>   
   > >> There is a speed of light problem here. Communicating across a   
   > >> computer is a microsecond time problem, whereas executing   
   > >> instructions is a nanosecond time problem.   
   > >>   
   > >> And this is exactly where Add-to-Memory gains over Interferable   
   > >> ATOMIC events--you only pay the latency once, now while the latency   
   > >> is higher than possible with LL-SC, it is WAY LOWER than worst case   
   > >> with LL-SC under serious contention.   
   >   
   > Yes. Even a single adder would have higher throughput than ping-   
   > ponging a cache block. One might even support a three-or-more   
   > input two-or-more result adder to improve throughtput (or   
   > perhaps exploit usually smaller addends) to increase throughput,   
   > though I suspect there would practically never be a case where a   
   > simple adder would have insufficient throughput.   
   >   
   > >>> This seems to be similar to branch hints and predication in that   
   > >>> urging the computer to handle the task in a specific way may not   
   > >>> be optimal for the goal of the user/programmer.   
   >   
   > >> Explain   
   >   
   > Branch hints can be intended to reduce branch predictor aliasing   
   > (i.e., assume the static hint is used instead of a dynamic   
   > predictor), to provide agree information, to prefer one path   
   > even if it is less likely, to provide an initialization of the   
   > (per-address component only?) branch predictor, or for some   
   > other motive. The interface/architecture might not be specific   
   > about how such information will be used, especially if it is a   
   > hint (and programmers might disagree about what the best   
   > interface would be). If the interface is not very specific, a   
   > microarchitecture might violate a programmer's   
   > desire/expectation by ignoring the hint or using it in a   
   > different way.   
   >   
   > Similarly predication can be motivated to avoid fetch   
   > redirection (initial ARM and My 66000), to facilitate constant   
   > time execution, to avoid the performance cost of branch   
   > mispredictions, or perhaps for some reason that does not come to   
   > mind. Predicate prediction would foil constant time execution   
   > and might reduce performance (or merely introduce weird   
   > performance variation). Even the fetch optimization might be   
   > undone if the hardware discovers that the condition is extremely   
   > biased and folds out the rarely used instructions; which would   
   > be good for performance if the bias continues, but if the bias   
   > changes just frequently enough it could hurt performance.   
   >   
   > [snip]   
   > >> There is no reason not to predict My 66000-style predication,   
   > >> nor is there any great desire/need TO predict them, either.   
   >   
   > Except that prediction could violate the time constancy assumed   
   > by the programmer.   
      
   Time constancy is provided by execution both then clause and else clause   
   and using CMOV to decide on flow.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|