From: tkoenig@netcologne.de   
      
   MitchAlsup schrieb:   
      
   [...]   
      
   > Without the logarithmic attempts, the general word used to describe   
   > these things is "reduction".   
      
   Ah, yes, of course.   
      
   [...]   
      
   >   
   > VVM has the ability to choose execution width (based on HW resources   
   > and based on data recurrence). In the past I have given examples   
   > where VVM is executing at "width" and then because of a memory   
   > "alias" has to drop back to 1-wide until the pointers cross before   
   > reverting back to full width.   
   >   
   > This algorithm (reduction) is a modification to dynamic width control,   
   > where width is constant until the final K iterations and then decreases   
   > by ½ each iteration thereafter. So, fundamentally, VVM does not have   
   > a problem with reductions "expressed right".   
      
   How would this be expressed? As straightforward serialized code in   
   a VVM loop? This would suffer from needing memory accesses to   
   store and reload intermediate results.   
      
   [...]   
      
   > However, the given problem of 512-bits (64-bytes) might not find much   
   > if any speedup, due to initialization, and a potential stutter step   
   > on each DIV-2 iteration.   
      
   >   
   > It might be better to allow the HW to recognize some inst have   
   > certain properties and integrate those into VVM recognition so   
   > that VVM performs a wide calculation; roughly akin to the   
   > following:   
   >   
   > for(...)   
   > local_minimum[ k>>3 ] = MIN( a[k,k+7] ); k+=8;   
   > for(...)   
   > global_minimum = MIN( local_minimum[i,i+7] ); i+=8;   
      
   An issue here could be the "as if" serial nature of VVM, I think -   
   if an interrupt occurs, you would have to restore the state.   
      
   > For sizes as small as 512-bits, VVM might not have an advantage.   
      
   Do you mean over SIMD or straightforward serial code? I would be   
   good if VVM came within shouting distance of SIMD.   
      
   >On the   
   > other hand, if HW knew certain things about some instructions, the top   
   > loop might be performed simultaneously with the bottom loop--more or   
   > less like having an adder that performs {8×64, 16×32, 32×16, 64×8}   
   > calculations simultaneously in reduction form {Exact for integer,   
   > Single rounding for 2^n FPs} and this wide adder feeds the second   
   > calculation 1 K× reduction per cycle in a single merged loop.   
   >   
   > Needs more thought. {Known problem}   
      
   I would guess so...   
   --   
   This USENET posting was made without artificial intelligence,   
   artificial impertinence, artificial arrogance, artificial stupidity,   
   artificial flavorings or artificial colorants.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|