From: anton@mips.complang.tuwien.ac.at   
      
   Michael S writes:   
   >Where does sequential consistency simplifies programming over x86 model   
   >of "TCO + globally ordered synchronization primitives +   
   >every synchronization primitives have implied barriers"?   
   >   
   >More so, where it simplifies over ARMv8.1-A, assuming that programmer   
   >does not try to be too smart and never uses LL/SC and always uses   
   >8.1-style synchronization instructions with Acquire+Release flags set?   
   >   
   >IMHO, the only simple thing about sequential consistency is simple   
   >description. Other than that, it simplifies very little. It does not   
   >magically make lockless multithreaded programming bearable to   
   >non-genius coders.   
      
   Is single-core multi-threaded programming bearable to non-genius   
   programmers? I think so. Sequential consistency plus atomic sequences   
   (where the single-core program disables interrupts to start an atomic   
   sequence and enables them to end an atomic sequence) gives the same   
   programming model.   
      
   Concerning synchronization instructions and memory barriers of   
   architectures with weaker memory models, their main problem is that   
   they are implemented slowly, because the idea is to make only the   
   weaker memory model go fast, and then suffer what you must if you need   
   more guarantees. Already the guarantee makes them slow, not just the   
   actual synchronization case. This makes the memory model hard to use,   
   because you want to minimize the use of these instructions. And   
   that's where the need for genius-level coding comes in.   
      
   As for the size of the description, IMO this reflects on the   
   simplicity of programming. ARM's memory model was advertized here as:   
   "It's only 32 pages" . If it is   
   simple to program, why does it need 32 pages of description?   
      
   Concerning non-genius coders and coders that are not experts in memory   
   ordering models, the current setup seems to be design to have a few   
   people who program system software that does such things, and   
   everybody else should just use this software (whether it's system   
   calls or libraries). That's ok if the need to communicate between   
   threads is rare, but not so great if it is frequent (especially the   
   system-call variant). And if the need to communicate between threads   
   is rare, it's also good enough if the hardware features for that need   
   are slow. So maybe this whole setup is good enough.   
      
   OTOH, maybe there are applications that could potentially use multiple   
   threads that are currently using sequential programs or context   
   switching within a hardware thread (green threads and the like)   
   because the communication between the threads is too slow and making   
   it faster is too hard to program. In that case the underutilization   
   of many of the multi-core CPUs that we have may be due to this   
   phenomenon. If so, the argument that it's too expensive in hardware   
   resources to implement sequential consistency in hardware well does   
   not hold: Is it more expensive than implementing an 8-core CPU where 6 or 7   
   cores are usually not utilized?   
      
   - anton   
   --   
   'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'   
    Mitch Alsup,    
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|