home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 131,111 of 131,241   
   Terje Mathisen to Thomas Koenig   
   Re: A useless machine   
   15 Feb 26 22:59:19   
   
   From: terje.mathisen@tmsw.no   
      
   Thomas Koenig wrote:   
   > Terje Mathisen  schrieb:   
   >   
   >> Just thinking about a way to replace ~N iterations by a MUL, ADD and SHR   
   >>   
   >> Even with 128-bit numbers, I would only need two MULs, two ADD/ADC pairs   
   >> and a SHRD/SHR, so maybe 8-10 cycles?   
   >   
   > https://link.springer.com/article/10.1007/s11227-025-07337-0 gives   
   > an improved algorithm, which they are using for the supercomputer   
   > version.  They are doing:   
   >   
   > n = n0   
   > repeat   
   >    n = n + 1   
   >    alpha = ctz(n)   
   >    n = n * 3^alpha / 2^alpha   
      
   Strange syntax...   
      
   ctz(n) must count trailing zeroes, right?   
   If so, then ctz(n+1) will return 1 or more for odd n, so that gives   
   3n+3, which after the division by 2 returns a value which is one too high.   
      
   >    n = n - 1   
      
   but this is corrected here!   
      
   The key must be that the multiplication by a power of three also works   
   for n with multiple trailing 1 bits.   
      
   The "x/2^alpha" part is of course just x>>alpha.   
      
   For an even n they still do a multiplication by 1 and a SHR 0, but the   
   code is branchless.   
      
   >    beta = ctz(n)   
   >    n = n / 2^beta   
   > until n < n_0   
      
   I am guessing that the best way to do the n*3^alpha parts is a table of   
   powers of 3, so   
      
   let mut n = n0;   
   loop {   
      n += 1;   
      let alpha = ctz(n);   
      n = (n * power_of_three[alpha]) >> alpha;   
      n -= 1;   
      beta = ctz(n);   
      n >>= beta;   
      if n <= n_0 {break};   
   }   
      
   >   
   > They are also playing some strange tricks with parallel solving that   
   > are not described clearly, or I am not clever enough to understand   
   > what they are doing. I would have to look at their code, I guess.   
   >   
   :-)   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca