home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,363 of 4,675   
   Terje Mathisen to Robert Wessel   
   Re: Reciprocal MUL LUT   
   25 Apr 18 04:15:50   
   
   From: terje.mathisen@nospicedham.tmsw.no   
      
   Interesting ideas, I've got to ask though, what is the need/intention here?   
      
   I.e. why do you need the prime reciprocals, what do you need them for,   
   why just 512 of them, and is size more or less important than speed of   
   access?   
      
   Terje   
      
   Robert Wessel wrote:   
   > On Tue, 24 Apr 2018 11:04:26 +0200, "wolfgang kern"    
   > wrote:   
   >   
   >>   
   >> I try to shorten my current 512 bit 1/primes LUT because reciprocals of   
   >> primes (all except 2) are periodic (yes, also 1/5 is periodic in binary).   
   >>   
   >> So the LUT may just hold the bit-patterns of the periods with their size   
   >> in bits or bytes and leading zero-bit count for byte aligned storage plus   
   >> some space saving and a 2^-n scaling info.   
   >> This patterns can be repeated to any desired precision then ie:   
   >> ________________________________________________________   
   >> Prime|bits|pattern             |stored as     |leading Z-bits(comment)   
   >>   
   >> 3      2   01                   0x55(555555)   -   
   >> 5      4   0011                 0x03(030303)   -   
   >> 7      3   001                  0x249249       -(doubled for byte allign)   
   >> 11    10   0001011101           0x1745D1745D   -(ditto)   
   >> 13    12   000100111101         0x13B13B       -(ditto)   
   >> 17     8   00001111             0x0f(0f0f0f)   -   
   >> 19    18   000011010111100101   0xD79435E5     4   
   >> 23    11   00001011001          0xB21642C859   4   
   >> 29    28   00001000110111001011 0x8D3DCB       4   
   >> 31     5   00001                0x8421         4   
   >> ...   
   >> 53    52   see hex              0x4D4873ECADE3 4   
   >> ...   
   >> 73     9   000000111            0x381C0E07     4   
   >> ... and so on   
   >> _____________   
   >>   
   >> values for higher primes will really need 512 bits or more, but the whole   
   >> LUT will become quite shorter and so allow addon of higher primes.   
   >>   
   >> Even this idea need some overhead with linked lists, multiple unaligned   
   >> loads and shifts, it may gain size and speed compared to my previous   
   >> 512 bit LUT.   
   >>   
   >> I can already hear: "why not use NR 1/x with AVX512 ?". Because only   
   >> my newest PC has AVX512 and ~160 client machines haven't got it yet.   
   >> And how fast and precise can the NR-methode become in comparision to   
   >> a LUT ? ;)   
   >   
   >   
   > How fast does access to an entry in the table need to be?  Given that   
   > you're storing primes, I assume you're expecting some sort of search   
   > already.   
   >   
   > You could store each entry as a series of 31 bit words (adaptation to   
   > other words size is obvious), with the high bit of the first word of   
   > an entry set to one for, and zero for all extension words.  Each entry   
   > would consist of variable length sections for each data item.  The   
   > n*31 bit entry would look something like:   
   >   
   >   bits(4) lz (number of bits needed to store lengths)   
   >   bits(lz) prime_length   
   >   bits(lz) leading_zeros   
   >   bits(lz) pattern_length   
   >   bits(prime_length) prime   
   >   bits(pattern_length) recip pattern   
   >   
   > The four bit size-of-lengths does, with a simple coding, limit you to   
   > lengths of 32767.  If you need longer, that could obviously be   
   > adjusted.   
   >   
   > So, for example, the entry for 19 would look like:   
   >   
   >   bits(4) 0101 (size of lengths) (5)   
   >   bits(5) 00101 (prime_length) (5)   
   >   bits(5) 00100 (leading_zeros) (4)   
   >   bits(5) 10010 (pattern_length) (18)   
   >   bits(5) 10011 (prime) (19)   
   >   bits(18) 000011010111100101 (recip pattern)   
   >   
   > For a total of 42 data bits.   
   >   
   > Which would be encoded in two 31 bit chunks in 32 bit words as:   
   >   
   >   1010 1001 0100 1001 0010 1001 1000 0110 ("1"+first 31 bits)   
   >   0101 1110 0101 0000 0000 0000 0000 0000 ("0"+next 11 bits+pad)   
   >   
   > You can then just binary search that table looking for your particular   
   > prime, and find the beginnings of the entries because the initial   
   > words all have the high bit set.  Obviously some bit manipulation is   
   > required to use that entry.   
   >   
      
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca