From: user5857@newsgrouper.org.invalid   
      
   anton@mips.complang.tuwien.ac.at (Anton Ertl) posted:   
      
   > MitchAlsup writes:   
   > >   
   > >Terje Mathisen posted:   
   > >> (hi, lo) = a*b+c+d   
   > >   
   > >Alas:: the best CARRY can do is:   
   > >   
   > > {hi,c} = a*b+hi   
   >   
   > What latency?   
      
   1 multiply latency {likely 4 cycles} but more importantly no more cycles   
   than   
    c = a*b;   
      
   > >> simply because this is the largest possible building block that cannot   
   > >> overflow, the result range covers the full 128 bit space.   
   >   
   > With the carry in the result GPR, you could achieve that as follows:   
   >   
   > add t,c,d   
   > umaddc hi,lo,a,b,t   
      
   You can do this at the added latency of ADD.   
      
   > (or split umaddc into an instruction that produces the low result and   
   > one that produces the high result).   
      
   CARRY is an instruction-modifier it is not "executed" {or you can   
   consider it "executed" in the DECODE stage of the pipeline.} The   
   subsequent MUL takes no more time CARRY or no-CARRY.   
      
   > The disadvantage here is that, with d being the hi of the last   
   > iteration, you will see the full latency of the add and the umaddh.   
      
   Does R stand for Reduced or Ridiculoous ?!?   
      
   >   
   > - anton   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|