... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,978 of 131,241
MitchAlsup to All
Re: Variable-length instructions (1/2)
05 Feb 26 18:49:39
   From: user5857@newsgrouper.org.invalid   
      
   Paul Clayton  posted:   
      
   > On 12/28/25 6:53 PM, MitchAlsup wrote:   
   > >   
   > > "Chris M. Thomasson"  posted:   
   > >   
   > >> On 12/28/2025 2:04 PM, MitchAlsup wrote:   
   > >>>   
   > >>> "Chris M. Thomasson"  posted:   
   > >>>   
   > >>>> On 12/22/2025 1:49 PM, Chris M. Thomasson wrote:   
   > >>>>> On 12/21/2025 1:21 PM, MitchAlsup wrote:   
   > >>>>>>   
   > >>>>>> "Chris M. Thomasson"  posted:   
   > >>>>>>   
   > >>>>>>> On 12/21/2025 10:12 AM, MitchAlsup wrote:   
   > >>>>>>>>   
   > >>>>>>>> John Savard  posted:   
   > >>>>>>>>   
   > >>>>>>>>> On Sat, 20 Dec 2025 20:15:51 +0000, MitchAlsup wrote:   
   > >>>>>>>>>   
   > >>>>>>>>>> For argument setup (calling side) one needs MOV   
   > >>>>>>>>>> {R1..R5},{Rm,Rn,Rj,Rk,Rl}   
   > >>>>>>>>>> For returning values (calling side)Â Â  needs MOV {   
   m,Rn,Rj},{R1..R3}   
   > >>>>>>>>>>   
   > >>>>>>>>>> For loop iterationsÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  needs MOV   
   {Rm,Rn,Rj},{Ra,Rb,Rc}   
   > >>>>>>>>>>   
   > >>>>>>>>>> I just can't see how to make these run reasonably fast within the   
   > >>>>>>>>>> constraints of the GBOoO Data Path.   
   > >>>>>>>>>   
   > >>>>>>>>> Since you actually worked at AMD, presumably you know why I'm   
   mistaken   
   > >>>>>>>>> here...   
   > >>>>>>>>>   
   > >>>>>>>>> when I read this, I thought that there was a standard technique for   
   > >>>>>>>>> doing   
   > >>>>>>>>> stuff like that in a GBOoO machine.   
   > >>>>>>>>   
   > >>>>>>>> There is::: it is called "load 'em up, pass 'em through". That is no   
   > >>>>>>>> different than any other calculation, except that no mangling of the   
   > >>>>>>>> bits is going on.   
   > >>>>>>>>   
   > >>>>>>>>>   Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â    
    Â Â Â Â Â Â Â Â Â Â  Just break down all the fancy   
   > >>>>>>>>> instructions into RISC-style pseudo-ops. But apparently, since you   
   > >>>>>>>>> would   
   > >>>>>>>>> know all about that, there must be a reason why it doesn't apply in   
   > >>>>>>>>> these   
   > >>>>>>>>> cases.   
   > >>>>>>>>   
   > >>>>>>>> x86 has short/small MOV instructions, Not so with RISCs.   
   > >>>>>>>   
   > >>>>>>> Does your EMS use a so called LOCK MOV? For some damn reason I   
   remember   
   > >>>>>>> something like that. The LOCK "prefix" for say XADD, CMPXCHG8B, ect..   
   > >>>>>>   
   > >>>>>> The 2-operand+displacement LD/STs have a lock bit in the instruction--   
   > >>>>>> that   
   > >>>>>> is it is not a Prefix. MOV in My 66000 is reg-reg or reg-constant.   
   > >>>>>>   
   > >>>>>> Oh, and its ESM not EMS. Exotic Synchronization Method.   
   > >>>>>>   
   > >>>>>> In order to get ATOMIC-ADD-to-Memory; I will need an In   
   truction-Modifier   
   > >>>>>> {A.K.A. a prefix}.   
   > >>>>>   
   > >>>>> Thanks for the clarification.   
   > >>>>   
   > >>>> On x86/x64 LOCK XADD is a loopless wait free operation.   
   > >>>>   
   > >>>> I need to clarify. Okay, on the x86 a LOCK XADD will make for a loopless   
   > >>>> impl. If we on another system and that LOCK XADD is some sort of LL/SC   
   > >>>> "style" loop, well, that causes damage to my loopless claim... ;^o   
   > >>>>   
   > >>>> So, can your system get wait free semantics for RMW atomics?   
   > >>>   
   > >>> A::   
   > >>>   
   > >>>        ATOMIC-to-Memory-size  [address]   
   > >>>        ADD                    Rd,--,#1   
   > >>>   
   > >>> Will attempt a ATOMIC add to L1 cache. If line is writeable, ADD is   
   > >>> performed and line updated. Otherwise, the Add-to-memory #1 is shipped   
   > >>> out over the memory hierarchy. When the operation runs into a cache   
   > >>> containing [address] in the writeable-state the add is performed and   
   > >>> the previous value returned. If [address] is not writeable the cache   
   > >>> line in invalidated and the search continues outward. {This protocol   
   > >>> depends on writeable implying {exclusive or modified} which is typical.}   
   > >>>   
   > >>> When [address] reached Memory-Controller it is scheduled in arrival   
   > >>> order, other caches system wide will receive CI, and modified lines   
   > >>> will be pushed back to DRAM-Controller. When CI is "performed" MC/   
   > >>> DRC will perform add #1 to [address] and previous value is returned   
   > >>> as its result.   
   > >>>   
   > >>> {{That is the ADD is performed where the data is found in the   
   > >>> memory hierarchy, and the previous value is returned as result;   
   > >>> with all cache-effects and coherence considered.}}   
   > >>>   
   > >>> A HW guy would not call this wait free--since the CPU is waiting   
   > >>> until all the nuances get sorted out, but SW will consider this   
   > >>> wait free since SW does not see the waiting time unless it uses   
   > >>> a high precision timer to measure delay.   
   > >>   
   > >> Good point. Humm. Well, I just don't want to see the disassembly of   
   > >> atomic fetch-and-add (aka LOCK XADD) go into a LL/SC loop. ;^)   
   > >   
   > > If you do it LL/SC-style you HAVE to bring data to "this" particular   
   > > CPU, and that (all by itself) causes n^2 to n^3 "buss" traffic under   
   > > contention. So you DON"T DO IT LIKE THAT.   
   >   
   > LL-op-SC could be recognized as an idiom and avoid bringing data   
   > to the core.   
      
   Can recognize:   
      
          LDL   Rd,[address]   
          ADD   Rd,Rd,#whatever   
          STC   Rd,[address]   
      
   Cannot recognize:   
      
          LDA   R1,[address]   
          CALL  LoadLocked   
          ADD   R2,R2,#whatever   
          CALL  StoreConditional   
      
   > > Atomic-to-Memory HAS to be done outside of THIS-CPU or it is not   
   > > Atomic-to-Memory. {{Thus it deserves its own instruction or prefix}}   
   >   
   > I wonder if there is an issue of communicating intention to the   
   > computer. Using atomic-to-memory may be intended to communicate   
   > that the operation is expected to be under contention or that   
   > moderating the impact under high contention is more important   
   > than having a fast "happy path".   
      
   There is a speed of light problem here. Communicating across a   
   computer  is a microsecond time problem, whereas executing   
   instructions is a nanosecond time problem.   
      
   And this is exactly where Add-to-Memory gains over Interferable   
   ATOMIC events--you only pay the latency once, now while the latency   
   is higher than possible with LL-SC, it is WAY LOWER than worst case   
   with LL-SC under serious contention.   
      
   > This seems to be similar to branch hints and predication in that   
   > urging the computer to handle the task in a specific way may not   
   > be optimal for the goal of the user/programmer.   
   Explain   
   >                                                 A programmer   
   > might use predication to avoid a branch that is expected to be   
   > poorly predicted or to have more consistent execution time. The   
      
   My 66000 predication can avoid 2 branches--it operates under the   
   notion that if FETCH reaches the join point before condition is   
   known, then predication is always faster than branching.   
      
   > former could be inappropriate for the computer to obey if the   
   > branch predictor became effective for that branch. If prediction   
   > is accurate, predicate prediction could improve performance but   
   > would break execution time consistency. Even reducing execution   
   > time when the predicate is known early might go against the   
   > programmer's intent by leaking information.   
      
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]