Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 3,301 of 4,675    |
|    Terje Mathisen to Robert Prins    |
|    Re: Online generation of constants for "    |
|    12 Mar 18 20:50:30    |
      From: terje.mathisen@nospicedham.tmsw.no              Robert Prins wrote:       > On 2018-03-12 09:05, Terje Mathisen wrote:       >> The reason for this extra stuff is that you need n+1 bits of actual       >> precision       >> in order to get exact results when emulating an n-bit division.       >       > Which is what Agner Fog attributes to you in his manuals.       >       > So many things are "too trivial" to require online tools, what's the       > harm in one more?       >       > Anyway, am I correct in finding that for limited range dividends, you       > can get away with, as if it matters, smaller shifts?       >       > I use 301036/8 for division by 3652425, 1881437/4 for division by 36525,       > 14035840/0 for division by 306, and 429496729/0 for division by 10, and       > for the JDN's I'm working with (1980-06-16 to now+) that works.              The accuracy of the reciprocal must be higher than both the divisor and       the dividend, so for division of the full 32-bit range you always need       an effectively 33 bit reciprocal.              For smaller values like I neded for calendar operations, i.e.       calculating the century number given a (julian) day number which is       known to be inside a 400-year period I could approximate j/100 with       41/4096. This works because that fraction just happens to be very close       to the exact (1/100) reciprocal. :-)              Multiplication by 41 can be done in several ways:               imul eax,edx ;; *41               lea edx,[eax+eax*4] ;; *5        lea eax,[eax+edx*8] ;; *41               lea edx,[eax+eax*8] ;; *9        shl eax,5 ;; *32        add eax,edx              On a cpu where LEA takes two cycles the last version can run in three       cycles while the dual-LEA version would take four.              Terje              --       - |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca