home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,321 of 4,675   
   Terje Mathisen to Anton Ertl   
   Re: Online generation of constants for "   
   28 Mar 18 15:38:39   
   
   From: terje.mathisen@nospicedham.tmsw.no   
      
   Anton Ertl wrote:   
   > Terje Mathisen  writes:   
   >> For a 32-bit division by X you calculate (1/X) * 2^n, with n just   
   >> big enough for the result to be between 2^31 and 2^32-1.   
   >   
   > An alternative is to compute RX=ceil(2^64/X), giving a 64-bit number   
   > (except for X=1, so you have to special-case for that if you want it   
   > totally general).  Then you can do unsigned division of Y by X as   
   > follows:   
   >   
   > P = (Y * RX) >> 64   
   >   
   > You need to implement 64*32->96 multiplication for this (requiring 2   
   > 32*32->64 multiplications), and then take only the upper 32 bits of   
   > the result.   
   >   
   > This eliminates the shift by a non-multiple of 32, and the   
   > correction step that Terje's algorithm performs; in the literature   
   > you find an algorithm that requires a 33-bit reciprocal and shifts,   
   > but that is definitely slower than the method above.   
      
   Anton, your approach is _possibly_ faster if you have both a very fast   
   and pipelined (so the two MULs can run overlapped) 32x32->64 MUL, and   
   ADD/ADC for easy carry propagation.   
      
   For half of all possible divisors the 33-bit reciprocal will end with a   
   zero, so you don't need any fixup code, and for the remainder you can do   
   that fixup quite quickly as long as the shift is single-cycle.   
      
   If you want generic code then you also need a mask to apply/don't apply   
   the fixup, so I can imagine how modern 3 or 4-cycle MULs make your   
   alternative quite attractive.   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca