From: anton@mips.complang.tuwien.ac.at   
      
   minforth writes:   
   >Today, you could go insane if you had to write assembler code   
   >with SSE1/2/3/4/AVX/AES etc. extended CPU commands (or take GPU   
   >programming...)   
   >   
   >Even chip manufacturers provide C libraries with built-ins and   
   >intrinsics to handle this complexity, and optimising C compilers   
   >for selecting the best operations.   
      
   Not really. Each AVX intrinsic corresponds to an instruction, and I   
   expect the compiler to produce that instruction. The benefit of the   
   intrinsics is that you can mix this assembly language with C code, and   
   the C compiler will do the register allocation for you, but normally   
   not a "better" operation. That being said, I have seen a case where   
   an AVX256 intrinsic was translated to two AVX128 or SSE2 instructions   
   because that sequence was suppsed to be faster on some Intel CPU (and   
   it's Intel who writes the code for AVX intrinsics).   
      
   In any case, given that there is one intrinsic for each SIMD   
   instruction, you go just as insane with the plethora of intrinsics as   
   with the plethora of SIMD instructions.   
      
   The C way to dealing with SIMD instructions is auto-vectorization. It   
   does not work particularly well, however, but given that it works on   
   existing benchmarks, it has an unsurmountable advantage over explicit   
   (manual) vecorization.   
      
   - anton   
   --   
   M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html   
   comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html   
    New standard: https://forth-standard.org/   
   EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/   
   EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|