Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 130,611 of 131,241    |
|    Robert Finch to Thomas Koenig    |
|    Re: A typical non-loop use case for SIMD    |
|    26 Dec 25 18:35:21    |
      From: robfi680@gmail.com              On 2025-12-26 4:57 p.m., Thomas Koenig wrote:       > (This might be blindingly obvious to most regulars, but I thought       > I'd post this, just in case for some discussion)       >       > SIMD is not always about vectorizing loops, they can also be used       > for tree-shaped reductions (not sure what the canonical name is).       >       > Consider the following problem: You have 128 consecutive bytes and       > want to find the minimum value, and you have 512-bit SIMD registers.       >       > You load the values into two 512-bit = 64 byte registers A and B, into       > positions A<0:7>, A<8:15>, ... A<504:511>, and B correspondingyl.       >       > You do a SIMD 8-bit "min" operation. The target c register contains       > the minimum values: min(A<0:7>,B<0:7>), min(A<8:15>,B<8:15>, ...       > min(A<504:511>,b<504:511>). C need not be distinct from A or B.       > You then have 64 values.       >       > You then move the upper values of C into register D (which need not       > be distinct from A or B), giving you D<0:7=min(A<256:263>,B<256:263)       > etc.       >       > You then do the min operation with half the length of the registers,       > giving you 32 values.       >       > And so on, until you have the single value, which is reached in       > seven steps.       >       > This kind of thing is, AFAIK, used in high-performance code such       > as JSON parsers or regexp matchers. An example (in principle)       > can be found at https://gitlab.ethz.ch/extra_projects/fastjson/ .       >       > Just wondering... is this somethig that VVM or similar could       > also do? Or does this actually require SIMD and the necessary       > shift, rotate or permute intructions?              RISCV IIRC has several reduction operations including min that finds the       minimum of all the values in a vector register so I think it does not       need a tree. It should only take two steps. I wonder if other       architectures have the same thing.              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca