home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 130,611 of 131,241   
   Robert Finch to Thomas Koenig   
   Re: A typical non-loop use case for SIMD   
   26 Dec 25 18:35:21   
   
   From: robfi680@gmail.com   
      
   On 2025-12-26 4:57 p.m., Thomas Koenig wrote:   
   > (This might be blindingly obvious to most regulars, but I thought   
   > I'd post this, just in case for some discussion)   
   >   
   > SIMD is not always about vectorizing loops, they can also be used   
   > for tree-shaped reductions (not sure what the canonical name is).   
   >   
   > Consider the following problem:  You have 128 consecutive bytes and   
   > want to find the minimum value, and you have 512-bit SIMD registers.   
   >   
   > You load the values into two 512-bit = 64 byte registers A and B, into   
   > positions A<0:7>, A<8:15>, ... A<504:511>, and B correspondingyl.   
   >   
   > You do a SIMD 8-bit "min" operation.  The target c register contains   
   > the minimum values: min(A<0:7>,B<0:7>), min(A<8:15>,B<8:15>, ...   
   > min(A<504:511>,b<504:511>).  C need not be distinct from A or B.   
   > You then have 64 values.   
   >   
   > You then move the upper values of C into register D (which need not   
   > be distinct from A or B), giving you D<0:7=min(A<256:263>,B<256:263)   
   > etc.   
   >   
   > You then do the min operation with half the length of the registers,   
   > giving you 32 values.   
   >   
   > And so on, until you have the single value, which is reached in   
   > seven steps.   
   >   
   > This kind of thing is, AFAIK, used in high-performance code such   
   > as JSON parsers or regexp matchers.  An example (in principle)   
   > can be found at https://gitlab.ethz.ch/extra_projects/fastjson/ .   
   >   
   > Just wondering... is this somethig that VVM or similar could   
   > also do? Or does this actually require SIMD and the necessary   
   > shift, rotate or permute intructions?   
      
   RISCV IIRC has several reduction operations including min that finds the   
   minimum of all the values in a vector register so I think it does not   
   need a tree. It should only take two steps. I wonder if other   
   architectures have the same thing.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca