From: already5chosen@yahoo.com   
      
   On Mon, 01 Dec 2025 07:56:37 GMT   
   anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:   
      
   > MitchAlsup writes:   
   > >   
   > >anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
   > >> Memory-ordering shenanigans come from the unholy alliance of   
   > >> cache-coherent multiprocessing and the supercomputer attitude.   
   > >   
   > >And without the SuperComputer attitude, you sell 0 parts.   
   > >{Remember how we talk about performance all the time here ?}   
   >   
   > Wrong. The supercomputer attitude gave us such wonders as IA-64   
   > (sells 0 parts) and Larrabee (sells 0 parts); why: because OoO is not   
   > only easier to program, but also faster.   
   >   
   > The advocates of weaker memory models justify them by pointing to the   
   > slowness of sequential consistency if one implements it by using   
   > fences on hardware optimized for a weaker memory model. But that's   
   > not the way to implement efficient sequential consistency.   
   >   
   > In an alternate reality where AMD64 did not happen and IA-64 won,   
   > people would justify the IA-64 ISA complexity as necessary for   
   > performance, and claim that the IA-32 hardware in the Itanium   
   > demonstrates the performance superiority of the EPIC approach, just   
   > like they currently justify the performance superiority of weak and   
   > "strong" memory models over sequential consistency.   
   >   
   > If hardware designers put their mind to it, they could make sequential   
   > consistency perform well, probably better on code that actually   
   > accesses data shared between different threads than weak and "strong"   
   > ordering, because there is no need to slow down the program with   
   > fences and the like in cases where only one thread accesses the data,   
   > and in cases where the data is read by all threads. You will see the   
   > slowdown only in run-time cases when one thread writes and another   
   > reads in temporal proximity. And all the fences etc. that are   
   > inserted just in case would also become fast (noops).   
   >   
      
   Where does sequential consistency simplifies programming over x86 model   
   of "TCO + globally ordered synchronization primitives +   
   every synchronization primitives have implied barriers"?   
      
   More so, where it simplifies over ARMv8.1-A, assuming that programmer   
   does not try to be too smart and never uses LL/SC and always uses   
   8.1-style synchronization instructions with Acquire+Release flags set?   
      
   IMHO, the only simple thing about sequential consistency is simple   
   description. Other than that, it simplifies very little. It does not   
   magically make lockless multithreaded programming bearable to   
   non-genius coders.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|