... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.asm.x86
Ahh, the lost art of x86 assembly
4,675 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 3,172 of 4,675
Terje Mathisen to All
Re: Palindromic number
08 Dec 17 23:31:24
   From: terje.mathisen@nospicedham.tmsw.no   
      
   I decided to go in and start writing a real 64-bit to ascii conversion,   
   I started by checking that I could use 32-bit code for all 7-digit numbers:   
      
   I use a scaled reciprocal for 1/1000000, so that I get a 52-bit product   
   which I then scale down with a SHRD to make a 4:28 fixed point results   
   that will fit in a 32-bit register.   
      
   This way I can extract each digit with 6 very simple instructions, for a   
   benchmark I timed generating all possible values from 0 to 9,999,999 and   
   got about 14.6 cycles/iteration, including the loop overhead which means   
   that I actually get a steady 3 instructions/cycle!   
      
   This only seems possible if the cpu runs MOV in zero cycles, using the   
   rename network instead since I get just two cycles/digit. I have checked   
   the actual asm code generated by the C test harness and it does seem   
   correct. I could obviously improve the output code by avoiding the 7   
   individual adds of '0' to convert from BCD to ascii, but this is not   
   part of the critical path and the cpu has enough execution units to do   
   this totally overlapped.   
      
   With three such conversions in parallel the total would probably not   
   improve too much, but even 47 cycles for the digit generating part would   
   make it likely that I could get close to my originally claimed 60   
   cycles. :-)   
      
   Terje   
      
   char *sevendigits_asm(uint32_t n, char *buf)   
   {   
   	__asm {   
   		mov eax,[n]   
   		mov edx,281474977   
   		mul edx   
   		shrd eax,edx,20   
   		inc eax   
   		mov edi,[buf]   
      
   		mov edx,eax   
   		shr eax,28   
   		and edx,0x0fffffff   
   		add al,'0'   
   		lea edx,[edx+edx*4]   
   		mov [edi],al   
      
   		mov eax,edx   
   		shr edx,27   
   		and eax,0x07ffffff   
   		add dl,'0'   
   		lea eax, [eax + eax * 4]   
   		mov [edi+1], dl   
      
   		mov edx, eax   
   		shr eax, 26   
   		and edx, 0x03ffffff   
   		add al, '0'   
   		lea edx, [edx + edx * 4]   
   		mov [edi+2], al   
      
   		mov eax, edx   
   		shr edx, 25   
   		and eax, 0x01ffffff   
   		add dl, '0'   
   		lea eax, [eax + eax * 4]   
   		mov [edi + 3], dl   
      
   		mov edx, eax   
   		shr eax, 24   
   		and edx, 0x00ffffff   
   		add al, '0'   
   		lea edx, [edx + edx * 4]   
   		mov [edi+4], al   
      
   		mov eax, edx   
   		shr edx, 23   
   		and eax, 0x007fffff   
   		add dl, '0'   
   		lea eax, [eax + eax * 4]   
   		mov [edi + 5], dl   
      
   		shr eax, 22   
   		add ax, '0'   
   		mov [edi+6], ax   
      
   	}   
   	return buf;   
   }   
      
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]