home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 130,666 of 131,241   
   MitchAlsup to All   
   Re: A typical non-loop use case for SIMD   
   29 Dec 25 21:17:53   
   
   From: user5857@newsgrouper.org.invalid   
      
   Stephen Fuld  posted:   
      
   > On 12/29/2025 11:59 AM, MitchAlsup wrote:   
   > >   
   > > Stephen Fuld  posted:   
   > >   
   > >> On 12/26/2025 1:57 PM, Thomas Koenig wrote:   
   > >>> (This might be blindingly obvious to most regulars, but I thought   
   > >>> I'd post this, just in case for some discussion)   
   > >>>   
   > >>> SIMD is not always about vectorizing loops, they can also be used   
   > >>> for tree-shaped reductions (not sure what the canonical name is).   
   > >>>   
   > >>> Consider the following problem:  You have 128 consecutive bytes and   
   > >>> want to find the minimum value, and you have 512-bit SIMD registers.   
   > >>   
   > >> Thomas, this is an excellent "test case" as it brings out at least two   
   > >> issues.  There has been discussion in this thread about the "reduction"   
   > >> problem.  Let me start on the other problem, that I call ALU   
   > >> underutilization.  It is caused by requiring lots of simple operations   
   > >> on small data elements.  For this example, I assume a four wide My 66000.   
   > >>   
   > >> Lets look at just the first pass.  I think the simplest coding would   
   > >> have the VVM loop consisting of two load instructions, two add   
   > >> instructions to increment the addresses and a min instruction.  Letting   
   > >> VVM do its magic, this would generate 4 byte min operations at a time,   
   > >> (one per ALU) and thus the loop would be executed 64/4 = 16 times.  I   
   > >> don't know how your hypothetical SIMD machine would do this, but it   
   > >> might do all 64 min operations in a single operation, or perhaps 2.   
   > >> This puts VVM at a substantial performance disadvantage.   
   > >>   
   > >> I have a possible suggestion to help this.  I don't claim it is the best   
   > >> solution.   
   > >>   
   > >> The problem stems from using only 8 bits of the 64 bit integer ALU for   
   > >> each operation, leading to more operations.  So one possible solution   
   > >> would be to add a new instruction modifier that tells the system that   
   > >> any relevant operations under its mask will do the whole register worth   
   > >> of operations using the size already specified in the the operation.   
   > >   
   > > This is exactly what VVM does, BTW. Smaller than register widths are   
   > > SIMDed into single "units of work" up to register width and performed   
   > > with the carry-chains clipped.   
   >   
   > Oh, I didn't realize that VVM already did that.  Bravo!   
      
   It is how the HW works::   
      
   An integer adder is comprised of 8×9-bit sections. You feed 8-data   
   bits into each section, and you feed into 9th-bit::   
   00 if you want the carry clipped,   
   01 if you want the carry propagated,   
   11 if you want a carry generated for next section;   
      
   An adder comprised of 9-bit sections is no more gates of delay than one   
   comprised of 8-bit sections.   
      
   > >> Since the min instruction would already have specified bytes,   
   > >   
   > > It is the memory instruction that specifies data width.   
   >   
   > I thought with your latest modifications to the ISA that instructions   
   > like min specified a data width.  But using the width specified in the   
   > memory reference instruction seems fine.  I can't think of a useful case   
   > where the two would be different.   
      
   Whereas the current ISA does have size/calculation, I have not had the time   
   to go back and examine vVM to the extent necessary to make any statements   
   on how vVM would work with these.   
      
   >   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca