From: anton@nospicedham.mips.complang.tuwien.ac.at   
      
   Terje Mathisen writes:   
   >Anton Ertl wrote:   
   >> Terje Mathisen writes:   
   >>> For a 32-bit division by X you calculate (1/X) * 2^n, with n just   
   >>> big enough for the result to be between 2^31 and 2^32-1.   
   >>   
   >> An alternative is to compute RX=ceil(2^64/X), giving a 64-bit number   
   >> (except for X=1, so you have to special-case for that if you want it   
   >> totally general). Then you can do unsigned division of Y by X as   
   >> follows:   
   >>   
   >> P = (Y * RX) >> 64   
   >>   
   >> You need to implement 64*32->96 multiplication for this (requiring 2   
   >> 32*32->64 multiplications), and then take only the upper 32 bits of   
   >> the result.   
   >>   
   >> This eliminates the shift by a non-multiple of 32, and the   
   >> correction step that Terje's algorithm performs; in the literature   
   >> you find an algorithm that requires a 33-bit reciprocal and shifts,   
   >> but that is definitely slower than the method above.   
   >   
   >Anton, your approach is _possibly_ faster if you have both a very fast   
   >and pipelined (so the two MULs can run overlapped) 32x32->64 MUL, and   
   >ADD/ADC for easy carry propagation.   
      
   Well, we are in clax, so we have all that (probably even in the   
   low-end Intel and AMD CPUs, although I have not measured it there, yet).   
      
   >If you want generic code then you also need a mask to apply/don't apply   
   >the fixup,   
      
   Yes, you can also use that approach if you have a division by a   
   loop-invariant value; you only need a separate replica of the loop for   
   the case where the divisor=1.   
      
   > so I can imagine how modern 3 or 4-cycle MULs make your   
   >alternative quite attractive.   
      
   For the upper 64-bits of a 64x64->128 multiplication, I have measured 6   
   cycles on the Skylake, but even then, I think it pays off.   
      
   - anton   
   --   
   M. Anton Ertl Some things have to be seen to be believed   
   anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen   
   http://www.complang.tuwien.ac.at/anton/home.html   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|