... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,035 of 131,241
Waldek Hebisch to David Brown
Re: Crisis? What Crisis? (was Re: On Cra
18 Oct 25 22:22:32
   From: antispam@fricas.org   
      
   David Brown  wrote:   
   > On 16/10/2025 23:26, BGB wrote:   
   >> On 10/16/2025 2:04 AM, David Brown wrote:   
   >>> On 16/10/2025 07:44, Lawrence D’Oliveiro wrote:   
   >>>> On Wed, 15 Oct 2025 22:19:18 GMT, MitchAlsup wrote:   
   >>>>   
   >>>>>> But the RISC-V folks still think Cray-style long vectors are better   
   >>>>>> than SIMD, if only because it preserves the “R” in “RISC”.   
   >>>>>   
   >>>>> The R in RISC-V comes from "student _R_esearch".   
   >>>>   
   >>>> “Reduced Instruction Set Computing”. That was what every single   
   >>>> primer on   
   >>>> the subject said, right from the 1980s onwards.   
   >>>>   
   >>>>> Oh, and BTW: I don't believe SIMD is better than CRAY-like vectors (or   
   >>>>> vice versa)--they simply represent different ways of shooting yourself   
   >>>>> in the foot.   
   >>>>   
   >>>> The primary design criterion, as I understood it, was to avoid   
   >>>> filling up   
   >>>> the instruction opcode space with a combinatorial explosion. (Or   
   >>>> sequence   
   >>>> of combinatorial explosions, when you look at the wave after wave of   
   >>>> SIMD   
   >>>> extensions in x86 and elsewhere.)   
   >>>   
   >>> I believe another aim is to have the same instructions work on   
   >>> different hardware.  With SIMD, you need different code if your   
   >>> processor can add 4 ints at a time, or 8 ints, or 16 ints - it's all   
   >>> different instructions using different SIMD registers.  With the   
   >>> vector style instructions in RISC-V, the actual SIMD registers and   
   >>> implementation are not exposed to the ISA and you have the same code   
   >>> no matter how wide the actual execution units are.  I have no   
   >>> experience with this (or much experience with SIMD), but that seems   
   >>> like a big win to my mind.  It is akin to letting the processor   
   >>> hardware handle multiple instructions in parallel in superscaler cpus,   
   >>> rather than Itanium EPIC coding.   
   >>>   
   >>   
   >> But, there is problem:   
   >> Once you go wider than 2 or 4 elements, cases where wider SIMD brings   
   >> more benefit tend to fall off a cliff.   
   >>   
   >> More so, when you go wider, there are new problems:   
   >>    Vector Masking;   
   >>    Resource and energy costs of using wider vectors;   
   >>    ...   
   >>   
   >   
   > I appreciate that.  Often you will either be wanting the operations to   
   > be done on a small number of elements, or you will want to do it for a   
   > large block of N elements which may be determined at run-time.  There   
   > are some algorithm, such as in cryptography, where you have sizeable but   
   > fixed-size blocks.   
   >   
   > When you are dealing with small, fixed-size vectors, x86-style SIMD can   
   > be fine - you can treat your four-element vectors as single objects to   
   > be loaded, passed around, and operated on.  But when you have a large   
   > run-time count N, it gets a lot more inefficient.  First you have to   
   > decide what SIMD extensions you are going to require from the target,   
   > and thus how wide your SIMD instructions will be - say, M elements.   
   > Then you need to loop N / M times, doing M elements at a time.  Then you   
   > need to handle the remaining N % M elements - possibly using smaller   
   > SIMD operations, possibly doing them with serial instructions (noting   
   > that there might be different details in the implementation of SIMD and   
   > serial instructions, especially for floating point).   
      
   In many cases one can enlarge data structures to multiple of SIMD   
   vector size (and align them properly).  There requires some extra   
   code, but mot too much and all of it is outside inner loop.  So,   
   there is some waste, but rather small due to unused elements.   
      
   Of course, there is still trouble due to different SIMD vector   
   size and/or different SIMD instructions sets.   
      
   --   
                                 Waldek Hebisch   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]