Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 130,260 of 131,241    |
|    BGB to Michael S    |
|    Re: Tonights Tradeoff (2/2)    |
|    11 Nov 25 21:34:08    |
      [continued from previous message]              >> in-fact the BID converter was significantly slower.       >>       >       > DPD-specific code and algorithms make sense for multiplication.       > They likely makes sense for addition/subtraction as well, I didn't try       > to think deeply about it.       > But for division I wouldn't bother with DPD-specific things. Just       > convert mantissa from DPD to binary, then divide, normalize, round then       > convert back.       >              It is the 9-digit-decimal <-> Large Binary Integer converter step that       is the main issue here.              Going to/from 128-bit integer adds a few "there be dragons here" issues       regarding performance.              At the moment, I don't have a fast (and correct) converter between these       two representations (that also does not rely on any external libraries       or similar; or nothing outside of the C standard library).                                   Like, if you need to crack 128 bits into 9-digit chunks using 128-bit       divide, and if the 128-bit divider in question is a shift-and-subtract       loop, this sucks.              There are faster ways to do multiply by powers of 10, but divide by       powers-of-10 is still a harder problem at the moment.              Well, and also there is the annoyance that it is difficult to write an       efficient 128-bit integer multiply if staying within the limits of       portable C95.                     ...                            Goes off and tries a few things:        128-bit integer divider;        Various attempts at decimal long divide;        ...              Thus far, things have either not worked correctly, or have ended up       slower than the existing Newton-Raphson divider.                     the most promising option would be Radix-10e9 long-division, but       couldn't get this working thus far.              Did also try Radix-10 long division (working on 72 digit sequences), but       this was slower than the existing N-R divider.                     One possibility could be to try doing divide with Radix-10 in an       unpacked BCD variant (likely using bytes from 0..9). Here, compare and       subtract would be sower, but shifting could be faster, and allows a       faster way (lookup tables) to find "A goes into B, N times".              I still don't have much confidence in it though.                     Radix-10e9 has a higher chance of OK performance, if I could get the       long-division algo to work correctly with it. Thus far, I was having       difficulty getting it to give the correct answer. Integer divide was       tending to overshoot the "A goes into B N times" logic, and trying to       fudge it (eg, but adding 1 to the initial divisor) wasn't really       working; kinda need an accurate answer here, and a reliable way to scale       and add the divisor, ...                     Granted, one possibility could be to expand out each group of 9 digits       to 64 bits, so effectively it has an intermediate 10 decimal digits of       headroom (or two 10e9 "digits").              But, yeah, long-division is a lot more of a PITA than N-R or       shift-and-subtract.                     >       >> And, if I were going to do BID, would make more sense to do it as its       >> own thing, and build it mostly around 128-bit integer math.       >>       >>       >> But, in this case, I had decided to experiment with DPD.       >>       >>       >> Most likely, in this case if I wanted faster divide, that also played       >> well with the existing format, I would need to do long division or       >> similar.       >>       >>       >>       >>>>       >>>>       >>>> If I compare against the IBM decNumber library:       >>>> Multiply: 14 million.       >>>> Divide: 7 million       >>>>       >>>> The decNumber library doesn't appear to have a square-root       >>>> function...       >>>>       >>>>       >>>> Granted, there are possibly faster ways to do divide, versus using       >>>> Newton-Raphson in this case...       >>>>       >>>> It was not the point that I could pull the fastest possible       >>>> implementation out of thin air. But, does appear I am beating       >>>> decNumber at least for multiply performance and similar.       >>>>       >>>>       >>>       >>       >> Can note that while decNumber exists, at the moment, it is over 10x       >> more code...       >>       >>       >       >              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca