... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 131,024 of 131,241
MitchAlsup to All
Re: Variable-length instructions
10 Feb 26 19:02:09
   From: user5857@newsgrouper.org.invalid   
      
   Paul Clayton  posted:   
      
   > On 2/9/26 2:28 PM, MitchAlsup wrote:   
   > >   
   > > Paul Clayton  posted:   
   > >   
   > >> On 2/5/26 4:27 PM, MitchAlsup wrote:   
   > >>>   
   > >>> MitchAlsup  posted:   
   > >>>   
   > >>>> Paul Clayton  posted:   
   > >>   
   > >> [snip]   
   > >>>>> LL-op-SC could be recognized as an idiom and avoid bringing data   
   > >>>>> to the core.   
   > >>>>   
   > >>>> Can recognize:   
   > >>>>   
   > >>>>          LDL   Rd,[address]   
   > >>>>          ADD   Rd,Rd,#whatever   
   > >>>>          STC   Rd,[address]   
   > >>>>   
   > >>>> Cannot recognize:   
   > >>>>   
   > >>>>          LDA   R1,[address]   
   > >>>>          CALL  LoadLocked   
   > >>>>          ADD   R2,R2,#whatever   
   > >>>>          CALL  StoreConditional   
   > >>   
   > >> When would one want to decouple LL and SC into function calls   
   > >> away from the computation? Perhaps for in-place software   
   > >> instrumenation?   
   > >   
   > > Write, in pure K&R C, the functionality for LoadLocked and   
   > > StoreConditional.   
   >   
   > Then why would the compiler not inline such? It seems reasonable   
   > (to me) to blame poor scaling performance in that case on the   
   > compiler which did not inline a one instruction function (or on   
   > the developer who intentionally disabled such optimization).   
      
   Because in pure K&R C there is no concept of atomic things, and   
   thus one has to resort to ASM--beyond this, there is no concept   
   of inlining.   
   >   
   > [snip]   
   >   
   > >> [snip]   
   > >>>> There is no reason not to predict My 66000-style predication,   
   > >>>> nor is there any great desire/need TO predict them, either.   
   > >>   
   > >> Except that prediction could violate the time constancy assumed   
   > >> by the programmer.   
   > >   
   > > Time constancy is provided by execution both then clause and else clause   
   > > and using CMOV to decide on flow.   
   >   
   > This means ordinary My 66000 predication cannot be used for   
   > such. I have no idea whether writers of cryptographic software   
   > would be upset that a mechanism which seems like it would   
   > provide constant time is documented not to do so.   
      
   I neither stated such nor implied such.   
      
   >                                                   (I would   
   > _guess_ that most of the embedded strong guarantee real time   
   > software might select hardware that never does predict   
   > predication since caches and similar general purpose   
   > optimizations tend to increase the difficulty of providing   
   > strong timing guarantees.)   
   >   
   >   
   > Is My 66000 going to have CMOV?   
      
   Has had it for years, it also has multi-instruction predication   
   since forever.   
      
   >                                 What about something like some   
   > of the AArch64 simple conditional operations? (Yes, such gets   
   > un-RISCy both from count of defined instructions and instruction   
   > complexity, but I have read that the supported operations are   
   > somewhat common for single instruction hammock branches and are   
   > easier to fit microarchitecturally and to fit within a 32-bit   
   > instruction.) The code bloat (8 bytes vs. 4 bytes) and indirect   
   > encoding (i.e., vagueries of idiom recognition — though an   
   > encoded instruction can be implemented with performance that   
   > breaks expectations) are disadvantages of just using PRED, but   
   > PRED also removes artificial restrictions like only incrementing   
   > by 1 (such that idiom recognition could be implemented to fast   
   > path any conditional add_immediate).   
   >   
   > One interesting case for predication is code like the following:   
   >    if (a < 25) {a += 8;}   
   >   
   > The predicate is known to be available at least early enough to   
   > nullify the addition executed in parallel with the compare   
   > because the register operand is the same and the only other   
   > operands are immediates. It is not clear if this presents a   
   > useful optimization opportunity, but it seems an interesting   
   > case.   
   >   
   > I am also curious about what you view as the trade-offs of   
   > conditional move compared with conditional select. Obviously   
   > conditional select potentially includes a register copy and for   
   > typical OoO implementations the mechanism is similar because a   
   > conditional move renames the destination and reads the old value   
   > and the alternative value. Encoding the condition source may be   
   > harder for conditional select (e.g., test register A is/is not   
   > zero, based on test place register B or register C in register   
   > D, which requires four register names while conditional move   
   > exploits that one of the sources is the same as the   
   > destination); for My 66000 a generalized condition after a   
   > comparison would, I think, require a 6-bit condition bit   
   > specifier and a register source for the condition, which just   
   > fits in a 32-bit instruction (6-bit opcode, 6-bit condition bit   
   > specifier, four 5-bit register names).   
      
   Clang goes out of its way to convert things like the above into   
   CMOV form. Often this pre-optimization takes more instructions   
   and runs slower than pure-predication. Almost always when then-   
   clause or else-clause is not-already in a register, the CMOV   
   version is longer and slower.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]