From: user5857@newsgrouper.org.invalid   
      
   Thomas Koenig posted:   
      
   > (This might be blindingly obvious to most regulars, but I thought   
   > I'd post this, just in case for some discussion)   
   >   
   > SIMD is not always about vectorizing loops, they can also be used   
   > for tree-shaped reductions (not sure what the canonical name is).   
      
   I use the term "treeification" for this. It is useful in both SW and   
   HW applications.   
      
   > Consider the following problem: You have 128 consecutive bytes and   
   > want to find the minimum value, and you have 512-bit SIMD registers.   
   >   
   > You load the values into two 512-bit = 64 byte registers A and B, into   
   > positions A<0:7>, A<8:15>, ... A<504:511>, and B correspondingyl.   
      
   Since B is A+512 and 512 = ½ of the size, it is akin to how Cooley   
   and Tukey turned DFT into FFT--AND THIS is the key to many large scale   
   reduction processes. It also happens to be the way to Vectorize FFTs   
   for event faster performance on machines with vector capabilities.   
      
   > You do a SIMD 8-bit "min" operation. The target c register contains   
   > the minimum values: min(A<0:7>,B<0:7>), min(A<8:15>,B<8:15>, ...   
   > min(A<504:511>,b<504:511>). C need not be distinct from A or B.   
   > You then have 64 values.   
   >   
   > You then move the upper values of C into register D (which need not   
   > be distinct from A or B), giving you D<0:7=min(A<256:263>,B<256:263)   
   > etc.   
   >   
   > You then do the min operation with half the length of the registers,   
   > giving you 32 values.   
   >   
   > And so on, until you have the single value, which is reached in   
   > seven steps.   
   >   
   > This kind of thing is, AFAIK, used in high-performance code such   
   > as JSON parsers or regexp matchers. An example (in principle)   
   > can be found at https://gitlab.ethz.ch/extra_projects/fastjson/ .   
   >   
   > Just wondering... is this somethig that VVM or similar could   
   > also do? Or does this actually require SIMD and the necessary   
   > shift, rotate or permute intructions?   
      
   I will get back to you on this.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|