From: david.brown@hesbynett.no   
      
   On 06/12/2025 18:44, MitchAlsup wrote:   
   >   
   > David Brown posted:   
   >   
   >> On 05/12/2025 21:54, MitchAlsup wrote:   
   >>>   
   >>> David Brown posted:   
   >>>   
   >>>> On 05/12/2025 18:57, MitchAlsup wrote:   
   >>>>>   
   >>>>> anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   >>>>>   
   >>>>>> David Brown writes:   
   >>>>>>> "volatile" /does/ provide guarantees - it just doesn't provide enough   
   >>>>>>> guarantees for multi-threaded coding on multi-core systems. Basically,   
   >>>>>>> it only works at the C abstract machine level - it does nothing that   
   >>>>>>> affects the hardware. So volatile writes are ordered at the C level,   
   >>>>>>> but that says nothing about how they might progress through storage   
   >>>>>>> queues, caches, inter-processor communication buses, or whatever.   
   >>>>>>   
   >>>>>> You describe in many words and not really to the point what can be   
   >>>>>> explained concisely as: "volatile says nothing about memory ordering   
   >>>>>> on hardware with weaker memory ordering than sequential consistency".   
   >>>>>> If hardware guaranteed sequential consistency, volatile would provide   
   >>>>>> guarantees that are as good on multi-core machines as on single-core   
   >>>>>> machines.   
   >>>>>>   
   >>>>>> However, for concurrent manipulations of data structures, one wants   
   >>>>>> atomic operations beyond load and store (even on single-core systems),   
   >>>>>   
   >>>>> Such as ????   
   >>>>   
   >>>> Atomic increment, compare-and-swap, locks, loads and stores of sizes   
   >>>> bigger than the maximum load/store size of the processor.   
   >>>   
   >>> My 66000 ISA can::   
   >>>   
   >>> LDM/STM can LD/ST up to 32 DWs as a single ATOMIC instruction.   
   >>> MM can MOV up to 8192 bytes as a single ATOMIC instruction.   
   >>>   
   >>   
   >> The functions below rely on more than that - to make the work, as far as   
   >> I can see, you need the first "esmLOCKload" to lock the bus and also   
   >> lock the core from any kind of interrupt or other pre-emption, lasting   
   >> until the esmLOCKstore instruction. Or am I missing something here?   
   >   
   > In the above, I was stating that the maximum width of LD/ST can be a lot   
   > bigger than the size of a single register, not that the above instructions   
   > make writing ATOMIC events easier.   
   >   
      
   That's what I assumed.   
      
   Certainly there are situations where it can be helpful to have longer   
   atomic reads and writes. I am not so sure about allowing 8 KB atomic   
   accesses, especially in a system with multiple cores - that sounds like   
   letting user programs DoS everything else on the system.   
      
   > These is no bus!   
      
   I think there's a typo or some missing words there?   
      
   >   
   > The esmLOCKload causes the address to be 'monitored'   
   > for interference, and to announce participation in the ATOMIC event.   
   >   
   > The FIRST esmLOCKload tells the core that an ATOMIC event is beginning,   
   > AND sets up a default control point (This instruction itself) so that   
   > if interference is detected at esmLOCKstore control is transferred to   
   > that control point.   
   >   
   > So, there is no way to write Test-and-Set !! you get Test-and-Test-and-Set   
   > for free.   
      
   If I understand you correctly here, you basically have a "load-reserve /   
   store-conditional" sequence as commonly found in RISC architectures, but   
   you have the associated loop built into the hardware? I can see that   
   potentially improving efficiency, but I also find it very difficult to   
   read or write C code that has hidden loops. And I worry about how it   
   would all work if another thread on the same core or a different core   
   was running similar code in the middle of these sequences. It also   
   reduces the flexibility - in some use-cases, you want to have software   
   limits on the number of attempts of a lr/sc loop to detect serious   
   synchronisation problems.   
      
   >   
   > There is a branch-on-interference instruction that   
   > a) does what it says,   
   > b) sets up an alternate atomic control point.   
   >   
   >> It is not easy to have atomic or lock mechanisms on multi-core systems   
   >> that are convenient to use, efficient even in the worst cases, and don't   
   >> require additional hardware.   
   >   
   > I am using the "Miss Buffer" as the point of monitoring for interference.   
   > a) it already has to monitor "other hits" from outside accesses to deal   
   > with the coherence mechanism.   
   > b) that esm additions to Miss Buffer are on the order of 2%   
   >   
   > c) there are other means to strengthen guarantees of forward progress.   
   >>   
   >>   
   >>> Compare Double, Swap Double::   
   >>>   
   >>> BOOLEAN DCAS( type oldp, type_t oldq,   
   >>> type *p, type_t *q,   
   >>> type newp, type newq )   
   >>> {   
   >>> type t = esmLOCKload( *p );   
   >>> type r = esmLOCKload( *q );   
   >>> if( t == oldp && r == oldq )   
   >>> {   
   >>> *p = newp;   
   >>> esmLOCKstore( *q, newq );   
   >>> return TRUE;   
   >>> }   
   >>> return FALSE;   
   >>> }   
   >>>   
   >>> Move Element from one place to another:   
   >>>   
   >>> BOOLEAN MoveElement( Element *fr, Element *to )   
   >>> {   
   >>> Element *fn = esmLOCKload( fr->next );   
   >>> Element *fp = esmLOCKload( fr->prev );   
   >>> Element *tn = esmLOCKload( to->next );   
   >>> esmLOCKprefetch( fn );   
   >>> esmLOCKprefetch( fp );   
   >>> esmLOCKprefetch( tn );   
   >>> if( !esmINTERFERENCE() )   
   >>> {   
   >>> fp->next = fn;   
   >>> fn->prev = fp;   
   >>> to->next = fr;   
   >>> tn->prev = fr;   
   >>> fr->prev = to;   
   >>> esmLOCKstore( fr->next, tn );   
   >>> return TRUE;   
   >>> }   
   >>> return FALSE;   
   >>> }   
   >>>   
   >>> So, I guess, you are not talking about what My 66000 cannot do, but   
   >>> only what other ISAs cannot do.   
   >>   
   >> Of course. It is interesting to speculate about possible features of an   
   >> architecture like yours, but it is not likely to be available to anyone   
   >> else in practice (unless perhaps it can be implemented as an extension   
   >> for RISC-V).   
   >>   
   >>>> Even with a   
   >>>> single core system you can have pre-emptive multi-threading, or at least   
   >>>> interrupt routines that may need to cooperate with other tasks on data.   
   >>>>   
   >>>>>   
   >>>>>> and I don't think that C with just volatile gives you such guarantees.   
   >>>>>>   
   >>>>>> - anton   
   >>>>   
   >>   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|