From: anton@nospicedham.mips.complang.tuwien.ac.at   
      
   Branimir Maksimovic writes:   
   >I tried with them recenlty and they are slow, slow,   
   >slower then manualy loading ;)   
   >I mean like "loop" instruction, uselless ;)   
      
   Possible explanations:   
      
   1) An instruction set designer thought that this could be implemented   
    better than by using scalar loads, but   
      
    a) the hardware designers did not get around to it.   
    b) the hardware designers tried, but the result was buggy, and was   
    disabled in delivered hardware.   
      
    Still, there is a slight benefit to having these instructions: If   
    there ever is a useful hardware implementation, software people can   
    use it in the knowledge that their code will at least run on a   
    variety of hardware (some may have a switch between using gather   
    instructions and scalar code, but not everyone can afford   
    development time for all CPU variations).   
      
   2) The instruction already worked better than the scalar code in the   
    Xeon Phi (I dimly remember reading something like that, although   
    looking at the cycle numbers I found the claim questionable), and   
    was added to other CPUs to support software that uses the   
    instruction. The problem with this theory is that Xeon Phi   
    supports (a variant of) AVX-512, but the Haswell and Skylake   
    (client) support only AVX2.   
      
   - anton   
   --   
   M. Anton Ertl Some things have to be seen to be believed   
   anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen   
   http://www.complang.tuwien.ac.at/anton/home.html   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|