home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,353 of 4,675   
   Melzzzzz to wolfgang kern   
   Re: Reciprocal MUL LUT   
   24 Apr 18 09:23:44   
   
   From: Melzzzzz@nospicedham.zzzzz.com   
      
   On 2018-04-24, wolfgang kern  wrote:   
   >   
   > I try to shorten my current 512 bit 1/primes LUT because reciprocals of   
   > primes (all except 2) are periodic (yes, also 1/5 is periodic in binary).   
   >   
   > So the LUT may just hold the bit-patterns of the periods with their size   
   > in bits or bytes and leading zero-bit count for byte aligned storage plus   
   > some space saving and a 2^-n scaling info.   
   > This patterns can be repeated to any desired precision then ie:   
   > ________________________________________________________   
   > Prime|bits|pattern             |stored as     |leading Z-bits(comment)   
   >   
   > 3      2   01                   0x55(555555)   -   
   > 5      4   0011                 0x03(030303)   -   
   > 7      3   001                  0x249249       -(doubled for byte allign)   
   > 11    10   0001011101           0x1745D1745D   -(ditto)   
   > 13    12   000100111101         0x13B13B       -(ditto)   
   > 17     8   00001111             0x0f(0f0f0f)   -   
   > 19    18   000011010111100101   0xD79435E5     4   
   > 23    11   00001011001          0xB21642C859   4   
   > 29    28   00001000110111001011 0x8D3DCB       4   
   > 31     5   00001                0x8421         4   
   > ...   
   > 53    52   see hex              0x4D4873ECADE3 4   
   > ...   
   > 73     9   000000111            0x381C0E07     4   
   > ... and so on   
   > _____________   
   >   
   > values for higher primes will really need 512 bits or more, but the whole   
   > LUT will become quite shorter and so allow addon of higher primes.   
   >   
   > Even this idea need some overhead with linked lists, multiple unaligned   
   > loads and shifts, it may gain size and speed compared to my previous   
   > 512 bit LUT.   
   >   
   > I can already hear: "why not use NR 1/x with AVX512 ?". Because only   
   > my newest PC has AVX512 and ~160 client machines haven't got it yet.   
   > And how fast and precise can the NR-methode become in comparision to   
   > a LUT ? ;)   
      
   Why not avx2? reciprocal calculation is 4 tacts on my machine.   
   Program was shown on comp.arch some time ago.   
      
   > __   
   > wolfgang   
      
      
   --   
   press any key to continue or any other to quit...   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca