From: cr88192@gmail.com   
      
   On 11/8/2025 5:28 AM, Thomas Koenig wrote:   
   > BGB schrieb:   
   >   
   >> I don't know yet if my implementation of DPD is actually correct.   
   >   
   > The POWER ISA has a pretty good description, see the OpenPower   
   > foundation.   
      
   Luckily, I have since figured it out and confirmed it.   
      
      
   Otherwise, fiddled with the division algorithm some more, and it is now   
   "slightly less awful", and converges a bit faster...   
      
   Relatedly, also added Square-Root...   
      
      
      
   My previous strategies for square-root didn't really work as effectively   
   in this case, so just sorta fiddled with stuff until I got something   
   that worked...   
      
   Algorithm I came up with (to find sqrt(S)):   
    Make an initial guess of the square root, calling it C;   
    Make an initial guess for the reciprocal of C, calling it H;   
    Take a few passes (threading the needle, *1):   
    C[n+1]=C+(S-(C*c))*(H*0.375)   
    Redo approximate reciprocal of C, as H (*2);   
    Refine H: H=H*(2-C*H)   
    Enter main iteration pass:   
    C[+1]=C+(S-(C*c))*(H*0.5)   
    H[+1]=H*(2-C*H) //(*3)   
      
      
   *1: Usual "try to keep stuff from flying off into space" step, using a   
   scale of 0.375 to undershoot convergence and increase stability (lower   
   means more stability but slower convergence; closer to 0.5 means faster,   
   but more likely to "fly off into space" depending on the accuracy of the   
   initial guesses).   
      
   *2: Seemed better to start over from a slightly better guess of C, than   
   to directly iterate from the initial (much less accurate) guess.   
      
   *3: Noting that if H is also converged, the convergence rate for C is   
   significantly improved (the gains from faster C convergence are enough   
   to offset the added cost of also converging H).   
      
   Seems to be effective, though still slower than divide (which is still   
   23x slower than an ADD or MUL).   
      
      
   In this case, the more complex algorithm being (ironically) partly   
   justified by the comparably higher relative cost per operation (and the   
   issue that I can't resort to tricks like handling the floating-point   
   values as integers; doesn't work so hot with Decimal128).   
      
      
      
      
   Felt curious, tried asking Grok about this, it identified this approach   
   as the Goldschmidt Algorithm, OK. If so, kinda weird that I arrived at a   
   well known (?) algorithm mostly by fiddling with it.   
      
   Looking on Wikipedia though, this doesn't look like the same algorithm   
   though.   
      
      
   Well, apart from some weird thing, where it initially responded in   
   Arabic for some reason (seems odd, it has recently gotten smart enough   
   to almost start being useful; apart from when it is being stupid, or   
   just doing something weird like responding in the wrong language).   
      
   ...   
      
      
   Well, also was fiddling with code to try to improve "general   
   robustness", like making the compare operation still work if inputs were   
   not normalized; dealing with some related edge cases in the ADD/SUB   
   logic; ...   
      
      
      
   So, ATM, this means it now has:   
    ADD, SUB, MUL, DIV, SQRT   
    Compare;   
    Printing and Parsing numbers as strings;   
    ...   
      
   In theory, could expand it out with other math functions if needed.   
      
      
   Still unclear if there is a use-case.   
    Drawback is that it is very slow, even vs Binary128.   
      
   Well, except maybe that the Square-Root algorithm could be applicable to   
   Binary128, which has a similar issue of slow operations. Though, in this   
   case, could just copy/paste the existing double-precision code for   
   long-double in my C library, which uses an unrolled Taylor Series in   
   that case.   
      
   ...   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|