home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 131,197 of 131,241   
   Terje Mathisen to Michael S   
   Re: A useless machine   
   22 Feb 26 12:45:44   
   
   From: terje.mathisen@tmsw.no   
      
   Michael S wrote:   
   > On Sat, 21 Feb 2026 20:36:51 +0200   
   > Michael S  wrote:   
   >   
   >>>   
   >>> Using a more brainiac approach would likely cut the performance in   
   >>> half and use a lot more LUTs.   
   >>>   
   >>   
   >> According to my experiments, combining 2 steps have very small   
   >> negative effect on achivable clock and increases area by ~1.8x. So,   
   >> to me it looks like a win.   
   >> That's the combined step I am talking about:   
   >>    x = n/4;   
   >>    switch (x % 4) {   
   >>      case 0: n = x; break;   
   >>      case 1: n = 3*x+1; break;   
   >>      case 2: n = 2*x+2; break;   
   >>      case 3: n = 9*x+8; break;   
   >>    }   
   >>   
   >>   
   >>   
   >   
   > Mistake in C above. Should be   
   >   switch (n % 4) {   
      
   I did notice that. :-)   
      
   The one thing that worries me (sw on a 64-bit platform) about the code   
   is the 9* on 128-bit variables:   
      
     9*x+8 =>   
      
   Do we use SHLD + SHL here or something else?   
      
   How about MUL & LEA?   
      
   ; Input in r10:r9, output in rdx:rax   
   mov rax,r9   
   mul rax,rbx	;; RBX == 9   
   lea r10,[r10+r10*8]   
   add rdx,r10   
      
   That looks like 5-6 clock cycles, so the branch misses from the switch   
   statement would probably dominate unless you do as I suggested and use   
   lookup tables instead:   
      
      let bot2 = n & 3;   
      let x = n >> 2;   
      n = x*multab[bot2] + addtab[bot2];   
      
   but if we do that, then (at least for a sw implementation) it would be   
   better to pick a lot more of the LS bits, at least 8-12?   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca