home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 130,606 of 131,241   
   MitchAlsup to All   
   Re: A typical non-loop use case for SIMD   
   26 Dec 25 22:11:24   
   
   From: user5857@newsgrouper.org.invalid   
      
   Thomas Koenig  posted:   
      
   > (This might be blindingly obvious to most regulars, but I thought   
   > I'd post this, just in case for some discussion)   
   >   
   > SIMD is not always about vectorizing loops, they can also be used   
   > for tree-shaped reductions (not sure what the canonical name is).   
      
   I use the term "treeification" for this. It is useful in both SW and   
   HW applications.   
      
   > Consider the following problem:  You have 128 consecutive bytes and   
   > want to find the minimum value, and you have 512-bit SIMD registers.   
   >   
   > You load the values into two 512-bit = 64 byte registers A and B, into   
   > positions A<0:7>, A<8:15>, ... A<504:511>, and B correspondingyl.   
      
   Since B is A+512 and 512 = ½ of the size, it is akin to how Cooley   
   and Tukey turned DFT into FFT--AND THIS is the key to many large scale   
   reduction processes. It also happens to be the way to Vectorize FFTs   
   for event faster performance on machines with vector capabilities.   
      
   > You do a SIMD 8-bit "min" operation.  The target c register contains   
   > the minimum values: min(A<0:7>,B<0:7>), min(A<8:15>,B<8:15>, ...   
   > min(A<504:511>,b<504:511>).  C need not be distinct from A or B.   
   > You then have 64 values.   
   >   
   > You then move the upper values of C into register D (which need not   
   > be distinct from A or B), giving you D<0:7=min(A<256:263>,B<256:263)   
   > etc.   
   >   
   > You then do the min operation with half the length of the registers,   
   > giving you 32 values.   
   >   
   > And so on, until you have the single value, which is reached in   
   > seven steps.   
   >   
   > This kind of thing is, AFAIK, used in high-performance code such   
   > as JSON parsers or regexp matchers.  An example (in principle)   
   > can be found at https://gitlab.ethz.ch/extra_projects/fastjson/ .   
   >   
   > Just wondering... is this somethig that VVM or similar could   
   > also do? Or does this actually require SIMD and the necessary   
   > shift, rotate or permute intructions?   
      
   I will get back to you on this.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca