From: user5857@newsgrouper.org.invalid   
      
   BGB posted:   
      
   > On 1/6/2026 11:57 AM, MitchAlsup wrote:   
   > >   
   > > Terje Mathisen posted:   
   > >   
   > >> MitchAlsup wrote:   
   > >>> When I looked deeply into the situation, it was easier in HW to do::   
   > >>>   
   > >>> for( i = 0; i < 8; i++ )   
   > >>> out[field[i]] = in[i]   
   > >>>   
   > >>> than::   
   > >>> for( i = 0; i < 8; i++ )   
   > >>> out[i] = in[field[i]]   
   > >>>   
   > >>   
   > >> That isn't really that surprising:   
   > >>   
   > >> This way the inputs are available early and in sequential order, while   
   > >> the stores can be allowed to have higher latency, right?   
   > >>   
   > >>> For some reason we called this swizzle not permute !?!   
   > >>   
   > >> I'm assuming collisions would be disallowed? I.e. you can use it to   
   > >> splat a single input into all output slots, but you cannot target   
   > >> multiple inputs toward the same destination.   
   > >   
   > > The later is why the HW logic is significantly easier.   
   >   
   > OK, but this does mean that the usability would be somewhat limited, and   
   > couldn't be used to generate the same sorts of repeating pattern fills   
   > needed for LZ decompression.   
      
   Field[i] was a constant generated by the compiler.   
      
   Do GPUs do much LZ decompression ??   
      
   > >>   
   > >> Terje   
   > >>   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|