... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,480 of 131,241
MitchAlsup to All
Re: Memory ordering (Re: Multi-precision
10 Dec 25 20:10:43
   From: user5857@newsgrouper.org.invalid   
      
   David Brown  posted:   
      
   > On 09/12/2025 22:28, MitchAlsup wrote:   
   > >   
   > > David Brown  posted:   
   > >   
   > >> On 09/12/2025 20:15, MitchAlsup wrote:   
   > >>>   
   > >>> David Brown  posted:   
   > >>>   
   > >>   
   > >>>> There are basically two ways to handle atomic operations.  One way is to   
   > >>>> use locking mechanisms to ensure that nothing (other cores, interrupts   
   > >>>> or other pre-emption on the same core) can break up the sequence.  The   
   > >>>> other way is to have a mechanism to detect conflicts and a failure of   
   > >>>> the atomic operation, so that you can try again (or otherwise handle the   
   > >>>> situation).  (You can, of course, combine these - such as by disabling   
   > >>>> local interrupts and detecting conflicts from other cores.)   
   > >>>>   
   > >>>> The code Mitch posted apparently had neither of these mechanisms, hence   
   > >>>> my confusion.  It turns out that it /does/ have conflict detection and a   
   > >>>> hardware retry loop, all hidden from anyone trying to understand the   
   > >>>> code.  (I can appreciate that there may be benefits in doing this in   
   > >>>> hardware, but there are no benefits in hiding it from the programmer!)   
   > >>>   
   > >>> How exactly do you inform the programmer that:   
   > >>>   
   > >>>          InBound   [Address]   
   > >>>          OutBound  [Address]   
   > >>>   
   > >>> operates like::   
   > >>>   
   > >>> try_again:   
   > >>>          InBound   [Address]   
   > >>>          BIN       try_again   
   > >>>          OutBound  [Address]   
   > >>>   
   > >>> And why clutter up asm with extraneous labels and require extra   
   instructions.   
   > >>   
   > >> The most obvious answer is that in any code that uses these features,   
   > >> good comments are essential so that readers can see what is happening.   
   > >>   
   > >> Another method would be to use better names for the intrinsics, as seen   
   > >> at the C (or other HLL) level.  (Assembly instruction names don't matter   
   > >> nearly as much.)   
   > >>   
   > >> So maybe instead of "esmLOCKload()" and "esmLOCKstore()" you have   
   > >> "load_and_set_retry_point()" and "store_or_retry()".  Feel free to think   
   > >> of better names, but that would at least give the reader a clue that   
   > >> there's something odd going on.   
   > >   
   > > This is a useful suggestion; thanks.   
   >   
   > I can certainly say they would help /me/ understand the code, so maybe   
   > they would help other people understand it too.   
   >   
   > >   
   > > On the other hand, there are some non-vonNeumann actions lurking within   
   > > esm. Where vonNeumann means: that every instruction is executed in its   
   > > entirety before the next instruction appears to start executing.   
   > >   
   >   
   > That's a rather different use of the term "vonNeumann" from anything I   
   > have heard.  I'd just talk about "indivisible" instructions (avoiding   
   > "atomic", because that usually refers to a wider view of the system).   
   > And are we thinking about the instructions purely from the viewpoint of   
   > the cpu executing them?   
      
   An ATOMIC event is a series of instructions that appear to be performed   
   all at once--as if the whole series was "indivisible".   
      
   > IME, most instructions on most processors are indivisible, but most   
   > processors have some instructions that are not.  For example, processors   
   > can have load/store multiple instructions that are interruptable - in   
   > some cases, after returning from the interrupt (and any associated   
   > thread context switches) the instructions are restarted, in other cases   
   > they are continued.   
      
   Go in the other direction, where a series of instructions HAS TO APPEAR   
   as if executed instantaneously.   
      
   > But most instructions /appear/ to be executed entirely before the next   
   > instruction /appears/ to start executing.  Fast processors have a lot of   
   > hardware designed to keep up this appearance - register renaming,   
   > pipelining, speculative execution, dependency tracking, and all the rest   
   > of it.   
      
   None of those things is ARHICTECTURAL--esm is an architectural window into   
   how to program ATOMIC events such no future generation of the ISA has to   
   continuously add more synchronization instructions. One can program every   
   known industrial and academic synchronization primitive in esm without ever   
   adding new synchronization instructions.   
      
   > > 1st:: one cannot single step through an ATMOIC event, if you enter an   
   > > ATOMIC event in single-step mode, you will see the 1st instruction in   
   > > the event, than you will receive control after the terminal instruction   
   > > has executed.   
   > >   
   >   
   > That is presumably a choice you made for the debugging features of the   
   > device.   
      
   No it is the nature of executing a series of instructions as if   
   instantaneously.   
      
   > > 2nd::the only way to debug an event is to have a buffer of SW locations   
   > > that gets written with non-participating STs. Unlike participating   
   > > memory lines, these locations will be written--but not in a sequentially   
   > > consistent manner (architecturally), and can be examined outside the   
   > > event; whereas the participating lines are either all written instan-   
   > > taneously or not modified at all.   
   > >   
   > > So, here we have non-participating STs having been written and older   
   > > participating STs have not.   
   > >   
   > > 3rd:: control transfer not under SW control--more like exceptions and   
   > > interrupts than Br-condition--except that the target of control transfer   
   > > is based on the code in the event.   
   > >   
   >   
   > OK.  I can see the advantages of that - though there are disadvantages   
   > too (such as being unable to control a limit on the number of retries,   
   > or add SW tracking of retry counts for metrics).   
      
   esm attempts to allow SW to program with features previously available   
   only at the µCode level. µCode allows for many µinstructions to execute   
   before/between any real instructions.   
      
   >                                                  My main concern was   
   > the disconnect between how the code was written and what it actually does.   
      
   There is a 26 page specification the programmer needs to read and understand.   
   This includes things we have not talked about--such as::   
   a) terminating an event without writing anything   
   b) proactively minimizing future interference   
   c) modifications to cache coherence model   
   at the architectural level.   
      
   The architectural specification allows for various scales of µArchitecture   
   to independently choose how to implement esm and provide the architectural   
   features at SW level. For example the kinds of esm activities for a 1-wide   
   In-Order µController are vastly different that those suitable for a server   
   scale rack of processor ensembles. What we want is one SW model that covers   
   the whole gamut.   
      
   > > 4th:: one cannot test esm with a random code generator, since the   
   probability   
   > > that the random code generator creates a legal esm event is exceedingly   
   low.   
   >   
   >   
   > Testing and debugging any kind of locking or atomic access solution is   
   > always very difficult.  You can rarely try out conflicts or potential   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]