home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,351 of 4,675   
   wolfgang kern to All   
   Reciprocal MUL LUT   
   24 Apr 18 11:04:26   
   
   From: nowhere@never.at   
      
   I try to shorten my current 512 bit 1/primes LUT because reciprocals of   
   primes (all except 2) are periodic (yes, also 1/5 is periodic in binary).   
      
   So the LUT may just hold the bit-patterns of the periods with their size   
   in bits or bytes and leading zero-bit count for byte aligned storage plus   
   some space saving and a 2^-n scaling info.   
   This patterns can be repeated to any desired precision then ie:   
   ________________________________________________________   
   Prime|bits|pattern             |stored as     |leading Z-bits(comment)   
      
   3      2   01                   0x55(555555)   -   
   5      4   0011                 0x03(030303)   -   
   7      3   001                  0x249249       -(doubled for byte allign)   
   11    10   0001011101           0x1745D1745D   -(ditto)   
   13    12   000100111101         0x13B13B       -(ditto)   
   17     8   00001111             0x0f(0f0f0f)   -   
   19    18   000011010111100101   0xD79435E5     4   
   23    11   00001011001          0xB21642C859   4   
   29    28   00001000110111001011 0x8D3DCB       4   
   31     5   00001                0x8421         4   
   ...   
   53    52   see hex              0x4D4873ECADE3 4   
   ...   
   73     9   000000111            0x381C0E07     4   
   ... and so on   
   _____________   
      
   values for higher primes will really need 512 bits or more, but the whole   
   LUT will become quite shorter and so allow addon of higher primes.   
      
   Even this idea need some overhead with linked lists, multiple unaligned   
   loads and shifts, it may gain size and speed compared to my previous   
   512 bit LUT.   
      
   I can already hear: "why not use NR 1/x with AVX512 ?". Because only   
   my newest PC has AVX512 and ~160 client machines haven't got it yet.   
   And how fast and precise can the NR-methode become in comparision to   
   a LUT ? ;)   
   __   
   wolfgang   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca