... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,503 of 131,241
MitchAlsup to All
Re: Memory ordering (Re: Multi-precision
12 Dec 25 23:39:53
   From: user5857@newsgrouper.org.invalid   
      
   "Chris M. Thomasson"  posted:   
      
   > On 12/12/2025 2:37 PM, Chris M. Thomasson wrote:   
   > > On 12/8/2025 12:06 PM, MitchAlsup wrote:   
   > >>   
   > >> "Chris M. Thomasson"  posted:   
   > >>   
   > >>> On 12/6/2025 5:42 AM, David Brown wrote:   
   > >>>> On 05/12/2025 21:54, MitchAlsup wrote:   
   > >>>>>   
   > >>>>> David Brown  posted:   
   > >>>>>   
   > >>>>>> On 05/12/2025 18:57, MitchAlsup wrote:   
   > >>>>>>>   
   > >>>>>>> anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   > >>>>>>>   
   > >>>>>>>> David Brown  writes:   
   > >>>>>>>>> "volatile" /does/ provide guarantees - it just doesn't provide   
   > >>>>>>>>> enough   
   > >>>>>>>>> guarantees for multi-threaded coding on multi-core systems.   
   > >>>>>>>>> Basically,   
   > >>>>>>>>> it only works at the C abstract machine level - it does nothing   
   > >>>>>>>>> that   
   > >>>>>>>>> affects the hardware.  So volatile writes are ordered at the C   
   > >>>>>>>>> level,   
   > >>>>>>>>> but that says nothing about how they might progress through   
   > >>>>>>>>> storage   
   > >>>>>>>>> queues, caches, inter-processor communication buses, or whatever.   
   > >>>>>>>>   
   > >>>>>>>> You describe in many words and not really to the point what can be   
   > >>>>>>>> explained concisely as: "volatile says nothing about memory   
   > >>>>>>>> ordering   
   > >>>>>>>> on hardware with weaker memory ordering than sequential   
   > >>>>>>>> consistency".   
   > >>>>>>>> If hardware guaranteed sequential consistency, volatile would   
   > >>>>>>>> provide   
   > >>>>>>>> guarantees that are as good on multi-core machines as on single-   
   > >>>>>>>> core   
   > >>>>>>>> machines.   
   > >>>>>>>>   
   > >>>>>>>> However, for concurrent manipulations of data structures, one wants   
   > >>>>>>>> atomic operations beyond load and store (even on single-core   
   > >>>>>>>> systems),   
   > >>>>>>>   
   > >>>>>>> Such as ????   
   > >>>>>>   
   > >>>>>> Atomic increment, compare-and-swap, locks, loads and stores of sizes   
   > >>>>>> bigger than the maximum load/store size of the processor.   
   > >>>>>   
   > >>>>> My 66000 ISA can::   
   > >>>>>   
   > >>>>> LDM/STM can LD/ST up to 32   DWs   as a single ATOMIC instruction.   
   > >>>>> MM      can MOV   up to 8192 bytes as a single ATOMIC   
   instruction.   
   > >>>>>   
   > >>>>   
   > >>>> The functions below rely on more than that - to make the work, as   
   > >>>> far as   
   > >>>> I can see, you need the first "esmLOCKload" to lock the bus and also   
   > >>>> lock the core from any kind of interrupt or other pre-emption, lasting   
   > >>>> until the esmLOCKstore instruction.  Or am I missing something here?   
   > >>>   
   > >>> Lock the BUS? Only when shit hits the fan. What about locking the cache   
   > >>> line? Actually, I think we can "force" an x86/x64 to lock the bus if we   
   > >>> do a LOCK'ed RMW on memory that straddles cache lines?   
   > >>   
   > >> In the My 66000 case, Mem References can lock up to 8 cache lines.   
   > >   
   > > Pretty flexible wrt implementing those exotic things back in the day,   
   > > experimental algos that need DCAS, KCSS, ect... A heck of a lot of   
   > > things can be accomplished with DWCAS, aka cmpxchg8b on a 32 bit system.   
   > > or cmpxchg16b on a 64-bit system.   
   > >   
   > > People would bend over backwards to get a DCAS, or NCAS. It would be   
   > > infested with strange indirection ala d"escriptors", and involved a shit   
   > > load of atomic RMW's. CAS, DWCAS, XCHG and XADD can get a lot done.   
   >   
   > Have you ever read about KCSS?   
   >   
   > https://groups.google.com/g/comp.arch/c/shshLdF1uqs   
   >   
   > https://patents.google.com/patent/US7293143   
      
   While I was not directly exposed to KCSS, I was exposed to the underlying   
   need for multi-location Compare and Swap requirements, and provided a means   
   to implement same in both ASF and ESM. {All of us (synchronization people)   
   were so exposed. And a lot of academic ideas came out of those trends, too.}   
      
   In my case, I simply wanted a way "out" of inventing a new synchronization   
   primitive ever ISA generation. What my solution entails is a modification   
   to the cache coherence model (NaK) that indicates "Yes I have the line you   
   referenced, but, no you can't have it right now" in order to strengthen   
   the guarantees of forward progress.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]