Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 3,367 of 4,675    |
|    wolfgang kern to Robert Wessel    |
|    Re: Reciprocal MUL LUT    |
|    25 Apr 18 10:02:03    |
   
   From: nowhere@never.at   
      
   Robert Wessel replied:   
      
   >>I try to shorten my current 512 bit 1/primes LUT because reciprocals of   
   >>primes (all except 2) are periodic (yes, also 1/5 is periodic in binary).   
   >>   
   >>So the LUT may just hold the bit-patterns of the periods with their size   
   >>in bits or bytes and leading zero-bit count for byte aligned storage plus   
   >>some space saving and a 2^-n scaling info.   
   >>This patterns can be repeated to any desired precision then ie:   
   >>________________________________________________________   
   >>Prime|bits|pattern |stored as |leading Z-bits(comment)   
   >>   
   >>3 2 01 0x55(555555) -   
   >>5 4 0011 0x03(030303) -   
   >>7 3 001 0x249249 -(doubled for byte allign)   
   >>11 10 0001011101 0x1745D1745D -(ditto)   
   >>13 12 000100111101 0x13B13B -(ditto)   
   >>17 8 00001111 0x0f(0f0f0f) -   
   >>19 18 000011010111100101 0xD79435E5 4   
   >>23 11 00001011001 0xB21642C859 4   
   >>29 28 00001000110111001011 0x8D3DCB 4   
   >>31 5 00001 0x8421 4   
   >>...   
   >>53 52 see hex 0x4D4873ECADE3 4   
   >>...   
   >>73 9 000000111 0x381C0E07 4   
   >>... and so on   
   >>_____________   
   >>   
   >>values for higher primes will really need 512 bits or more, but the whole   
   >>LUT will become quite shorter and so allow addon of higher primes.   
   >>   
   >>Even this idea need some overhead with linked lists, multiple unaligned   
   >>loads and shifts, it may gain size and speed compared to my previous   
   >>512 bit LUT.   
   >>   
   >>I can already hear: "why not use NR 1/x with AVX512 ?". Because only   
   >>my newest PC has AVX512 and ~160 client machines haven't got it yet.   
   >>And how fast and precise can the NR-methode become in comparision to   
   >>a LUT ? ;)   
      
   > How fast does access to an entry in the table need to be?   
      
   as fast as possible ? :)   
      
   > Given that you're storing primes, I assume you're expecting some sort   
   > of search already.   
      
   No, the primes are already known by their ordinal (#1 = 2) from either   
   a previous searched dw-LUT or just by its printed copy on paper.   
      
   Your below suggestion combine the prime list with their reciprocals,   
   which isn't a bad idea but assume a search for the prime.   
      
   > You could store each entry as a series of 31 bit words (adaptation to   
   > other words size is obvious), with the high bit of the first word of   
   > an entry set to one for, and zero for all extension words. Each entry   
   > would consist of variable length sections for each data item. The   
   > n*31 bit entry would look something like:   
      
   > bits(4) lz (number of bits needed to store lengths)   
   > bits(lz) prime_length   
   > bits(lz) leading_zeros   
   > bits(lz) pattern_length   
   > bits(prime_length) prime   
   > bits(pattern_length) recip pattern   
      
   > The four bit size-of-lengths does, with a simple coding, limit you to   
   > lengths of 32767. If you need longer, that could obviously be   
   > adjusted.   
      
   > So, for example, the entry for 19 would look like:   
   >   
   > bits(4) 0101 (size of lengths) (5)   
   > bits(5) 00101 (prime_length) (5)   
   > bits(5) 00100 (leading_zeros) (4)   
   > bits(5) 10010 (pattern_length) (18)   
   > bits(5) 10011 (prime) (19)   
   > bits(18) 000011010111100101 (recip pattern)   
   >   
   > For a total of 42 data bits.   
   >   
   > Which would be encoded in two 31 bit chunks in 32 bit words as:   
   >   
   > 1010 1001 0100 1001 0010 1001 1000 0110 ("1"+first 31 bits)   
   > 0101 1110 0101 0000 0000 0000 0000 0000 ("0"+next 11 bits+pad)   
      
   yeah, this sounds short and my idea for it is somehow similar...   
      
   > You can then just binary search that table looking for your particular   
   > prime, and find the beginnings of the entries because the initial   
   > words all have the high bit set. Obviously some bit manipulation is   
   > required to use that entry.   
      
   ... except that I'd use a ordinal link-table and much less BitManu:   
      
   tableoffset = ordinal # *8   
   b size (element bytes)   
   b leading Zero count   
   w pattern size (bits) ;0: may show byte aligned ;<0: period>512 bits   
   dw offset into the reciprocal pattern array.   
      
   your suggestion make me think about a combination of the two lists.   
   thanks,   
   __   
   wolfgang   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca