... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,266 of 131,241
Michael S to BGB
Re: Tonights Tradeoff (1/2)
12 Nov 25 11:47:34
   From: already5chosen@yahoo.com   
      
   On Tue, 11 Nov 2025 21:34:08 -0600   
   BGB  wrote:   
      
   > On 11/11/2025 6:03 AM, Michael S wrote:   
   > > On Tue, 11 Nov 2025 04:44:48 -0600   
   > > BGB  wrote:   
   > >   
   > >   
   > > Certainly not via GMP in final product. But doing 1st version via   
   > > GMP makes perfect sense.   
   > >   
   >   
   > GMP is only really an option for targets where GMP exists;   
      
   Decimal128 is of interest only on targets where GMP exists.   
      
   > Needed to jump over to GCC in WSL just to test GMP here.   
   >   
      
   So, you don't like msys2. It's your problem. Many Windows developers,   
   myself included, find it handy. Esp. newer variant of tools, prefixed   
   mingw-w64-ucrt-x86_64- .   
      
      
   > If avoidable, you don't want to use anything beyond the C standard   
   > library, and ideally limit things to a C95 style dialect for maximum   
   > portability.   
   >   
      
   I almost agree, except for C95.   
   C99 is, may be, too much, but C99 sub/super set known as C11 sounds   
   about right.   
   Also, I wouldn't consider such project without few extensions of   
   standard language. As a minimum:   
   - ability to get upper 64 bit of 64b*64b product   
   - convenient way to exploit 64-bit add with carry   
   - MS _BitScanReverse64 or Gnu __builtin_ctzll or equivalen   
   The first and the second items are provided by Gnu __int128.   
   All 3 items are available as standard features in C23, but I realize   
   that for your purposes it is a bit to early to rely on C23.   
      
   But all that only applies to final version of the library. At stage of   
   experimentation and proof of concept I suggest to use any available   
   tool. Including GMP.   
      
      
   > Granted, it does appear like the GMP divider is faster than expected.   
   >    Like, possibly something faster than "ye olde shift-and-subtract".   
   >   
      
   You see! It already had shown you something.   
   The mere knowledge that something already done successfully by others is   
   2/3rd of what you need to accomplish the same by yourself.   
   Even without looking at GMP sources. Which is certainly an option.   
      
   >   
   >   
   >   
   > Though, can note a curious property:   
   >    This code is around 79% faster when built with GCC vs MSVC;   
   >    In GCC, the relative speed of MUL and ADD trade places:   
   >      In MSVC, MUL is faster;   
   >      In GCC, ADD is faster.   
   >   
   > Though, the code in question tends to frequently use struct members   
   > directly, rather than caching multiply-accessed struct members in   
   > local variables. MSVC tends not to fully optimize away this sort of   
   > thing, whereas GCC tends to act as-if the struct members had in-fact   
   > been cached in local variables.   
   >   
   >   
   > >> this would be too big of a dependency), there is still the   
   > >> issue of efficiently converting between big-integer and the "groups   
   > >> of 9 digits in 32-bits" format.   
   > >   
   > > No, no, no. Not "group of 9 digits"! Plain unadulterated binary. 64   
   > > binary 'digits' per 64-bit word.   
   > >   
   >   
   > Alas, the code was written mostly to use 9-digit groupings, and going   
   > between 9-digit groupings and 128-bit integers is a bigger chunk of   
   > code than I want to have for this.   
   >   
      
   Using 9-digit groups during conversions is bad idea, both speed-wise   
   and code complexity wise. Much better to use groups of 18 digits. Or   
   15+19.   
      
      
   > This would mean an additional ~ 500 LOC, plus probably whatever code   
   > I need to do a semi-fast 256 by 128 bit integer divider.   
   >   
   >   
   > >>   
   > >>   
   > >> This is partly why I removed the BID code:   
   > >> At first, it seemed like the DPD and BID converters were similar   
   > >> speed; But, turns out I was still testing the DPD converter, and   
   > >> in-fact the BID converter was significantly slower.   
   > >>   
   > >   
   > > DPD-specific code and algorithms make sense for multiplication.   
   > > They likely makes sense for addition/subtraction as well, I didn't   
   > > try to think deeply about it.   
   > > But for division I wouldn't bother with DPD-specific things. Just   
   > > convert mantissa from DPD to binary, then divide, normalize, round   
   > > then convert back.   
   > >   
   >   
   > It is the 9-digit-decimal <-> Large Binary Integer converter step   
   > that is the main issue here.   
   >   
      
   See above.   
      
   > Going to/from 128-bit integer adds a few "there be dragons here"   
   > issues regarding performance.   
   >   
      
   Not really.   
   That is, conversions are not blazingly fast, but still much better   
   than any attempt to divide in any form of decimal. And helps to   
   preserve your sanity.   
   There is also psychological factor at play - your users expect   
   division and square root to be slower than other primitive FP   
   operations, so they are not disappointed. Possibly they are even   
   pleasantly surprised, when they find out that the difference in   
   throughput between division and multiplication is smaller than factor   
   20-30 that they were accustomed to for 'double' on their 20 y.o. Intel   
   and AMD.   
      
   > At the moment, I don't have a fast (and correct) converter between   
   > these two representations (that also does not rely on any external   
   > libraries or similar; or nothing outside of the C standard library).   
   >   
      
   For 'correct', don't hesitate to use GMP.   
   For 'not slow and correct' don't hesitate to use gnu extensions like   
   __int128. After majority of work is done and you are reasonably   
   satisfied with result, you can re-code in MS dialect, if that is your   
   wish. That would be a simple mechanical work.   
      
   >   
   >   
   >   
   > Like, if you need to crack 128 bits into 9-digit chunks using 128-bit   
   > divide, and if the 128-bit divider in question is a   
   > shift-and-subtract loop, this sucks.   
   >   
   > There are faster ways to do multiply by powers of 10, but divide by   
   > powers-of-10 is still a harder problem at the moment.   
   >   
   > Well, and also there is the annoyance that it is difficult to write   
   > an efficient 128-bit integer multiply if staying within the limits of   
   > portable C95.   
   >   
   >   
   > ...   
   >   
   >   
   >   
   > Goes off and tries a few things:   
   >    128-bit integer divider;   
   >    Various attempts at decimal long divide;   
   >    ...   
   >   
   > Thus far, things have either not worked correctly, or have ended up   
   > slower than the existing Newton-Raphson divider.   
   >   
   >   
   > the most promising option would be Radix-10e9 long-division, but   
   > couldn't get this working thus far.   
   >   
      
   No, just no. Anything non-binary no good for division.   
      
   > Did also try Radix-10 long division (working on 72 digit sequences),   
   > but this was slower than the existing N-R divider.   
   >   
   >   
   > One possibility could be to try doing divide with Radix-10 in an   
   > unpacked BCD variant (likely using bytes from 0..9). Here, compare   
   > and subtract would be sower, but shifting could be faster, and allows   
   > a faster way (lookup tables) to find "A goes into B, N times".   
   >   
   > I still don't have much confidence in it though.   
   >   
   >   
   > Radix-10e9 has a higher chance of OK performance, if I could get the   
   > long-division algo to work correctly with it. Thus far, I was having   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]