home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 130,260 of 131,241   
   BGB to Michael S   
   Re: Tonights Tradeoff (2/2)   
   11 Nov 25 21:34:08   
   
   [continued from previous message]   
      
   >> in-fact the BID converter was significantly slower.   
   >>   
   >   
   > DPD-specific code and algorithms make sense for multiplication.   
   > They likely makes sense for addition/subtraction as well, I didn't try   
   > to think deeply about it.   
   > But for division I wouldn't bother with DPD-specific things. Just   
   > convert mantissa from DPD to binary, then divide, normalize, round then   
   > convert back.   
   >   
      
   It is the 9-digit-decimal <-> Large Binary Integer converter step that   
   is the main issue here.   
      
   Going to/from 128-bit integer adds a few "there be dragons here" issues   
   regarding performance.   
      
   At the moment, I don't have a fast (and correct) converter between these   
   two representations (that also does not rely on any external libraries   
   or similar; or nothing outside of the C standard library).   
      
      
      
      
   Like, if you need to crack 128 bits into 9-digit chunks using 128-bit   
   divide, and if the 128-bit divider in question is a shift-and-subtract   
   loop, this sucks.   
      
   There are faster ways to do multiply by powers of 10, but divide by   
   powers-of-10 is still a harder problem at the moment.   
      
   Well, and also there is the annoyance that it is difficult to write an   
   efficient 128-bit integer multiply if staying within the limits of   
   portable C95.   
      
      
   ...   
      
      
      
   Goes off and tries a few things:   
      128-bit integer divider;   
      Various attempts at decimal long divide;   
      ...   
      
   Thus far, things have either not worked correctly, or have ended up   
   slower than the existing Newton-Raphson divider.   
      
      
   the most promising option would be Radix-10e9 long-division, but   
   couldn't get this working thus far.   
      
   Did also try Radix-10 long division (working on 72 digit sequences), but   
   this was slower than the existing N-R divider.   
      
      
   One possibility could be to try doing divide with Radix-10 in an   
   unpacked BCD variant (likely using bytes from 0..9). Here, compare and   
   subtract would be sower, but shifting could be faster, and allows a   
   faster way (lookup tables) to find "A goes into B, N times".   
      
   I still don't have much confidence in it though.   
      
      
   Radix-10e9 has a higher chance of OK performance, if I could get the   
   long-division algo to work correctly with it. Thus far, I was having   
   difficulty getting it to give the correct answer. Integer divide was   
   tending to overshoot the "A goes into B N times" logic, and trying to   
   fudge it (eg, but adding 1 to the initial divisor) wasn't really   
   working; kinda need an accurate answer here, and a reliable way to scale   
   and add the divisor, ...   
      
      
   Granted, one possibility could be to expand out each group of 9 digits   
   to 64 bits, so effectively it has an intermediate 10 decimal digits of   
   headroom (or two 10e9 "digits").   
      
   But, yeah, long-division is a lot more of a PITA than N-R or   
   shift-and-subtract.   
      
      
   >   
   >> And, if I were going to do BID, would make more sense to do it as its   
   >> own thing, and build it mostly around 128-bit integer math.   
   >>   
   >>   
   >> But, in this case, I had decided to experiment with DPD.   
   >>   
   >>   
   >> Most likely, in this case if I wanted faster divide, that also played   
   >> well with the existing format, I would need to do long division or   
   >> similar.   
   >>   
   >>   
   >>   
   >>>>   
   >>>>   
   >>>> If I compare against the IBM decNumber library:   
   >>>>      Multiply: 14 million.   
   >>>>      Divide: 7 million   
   >>>>   
   >>>> The decNumber library doesn't appear to have a square-root   
   >>>> function...   
   >>>>   
   >>>>   
   >>>> Granted, there are possibly faster ways to do divide, versus using   
   >>>> Newton-Raphson in this case...   
   >>>>   
   >>>> It was not the point that I could pull the fastest possible   
   >>>> implementation out of thin air. But, does appear I am beating   
   >>>> decNumber at least for multiply performance and similar.   
   >>>>   
   >>>>   
   >>>   
   >>   
   >> Can note that while decNumber exists, at the moment, it is over 10x   
   >> more code...   
   >>   
   >>   
   >   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca