From: terje.mathisen@tmsw.no   
      
   Thomas Koenig wrote:   
   > Terje Mathisen schrieb:   
   >   
   >> Just thinking about a way to replace ~N iterations by a MUL, ADD and SHR   
   >>   
   >> Even with 128-bit numbers, I would only need two MULs, two ADD/ADC pairs   
   >> and a SHRD/SHR, so maybe 8-10 cycles?   
   >   
   > https://link.springer.com/article/10.1007/s11227-025-07337-0 gives   
   > an improved algorithm, which they are using for the supercomputer   
   > version. They are doing:   
   >   
   > n = n0   
   > repeat   
   > n = n + 1   
   > alpha = ctz(n)   
   > n = n * 3^alpha / 2^alpha   
      
   Strange syntax...   
      
   ctz(n) must count trailing zeroes, right?   
   If so, then ctz(n+1) will return 1 or more for odd n, so that gives   
   3n+3, which after the division by 2 returns a value which is one too high.   
      
   > n = n - 1   
      
   but this is corrected here!   
      
   The key must be that the multiplication by a power of three also works   
   for n with multiple trailing 1 bits.   
      
   The "x/2^alpha" part is of course just x>>alpha.   
      
   For an even n they still do a multiplication by 1 and a SHR 0, but the   
   code is branchless.   
      
   > beta = ctz(n)   
   > n = n / 2^beta   
   > until n < n_0   
      
   I am guessing that the best way to do the n*3^alpha parts is a table of   
   powers of 3, so   
      
   let mut n = n0;   
   loop {   
    n += 1;   
    let alpha = ctz(n);   
    n = (n * power_of_three[alpha]) >> alpha;   
    n -= 1;   
    beta = ctz(n);   
    n >>= beta;   
    if n <= n_0 {break};   
   }   
      
   >   
   > They are also playing some strange tricks with parallel solving that   
   > are not described clearly, or I am not clever enough to understand   
   > what they are doing. I would have to look at their code, I guess.   
   >   
   :-)   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|