... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 130,227 of 131,241
MitchAlsup to All
Re: Tonights Tradeoff
10 Nov 25 02:12:53
   From: user5857@newsgrouper.org.invalid   
      
   BGB  posted:   
      
   > On 11/8/2025 5:28 AM, Thomas Koenig wrote:   
   > > BGB  schrieb:   
   > >   
   > >> I don't know yet if my implementation of DPD is actually correct.   
   > >   
   > > The POWER ISA has a pretty good description, see the OpenPower   
   > > foundation.   
   >   
   > Luckily, I have since figured it out and confirmed it.   
   >   
   >   
   > Otherwise, fiddled with the division algorithm some more, and it is now   
   > "slightly less awful", and converges a bit faster...   
   >   
   > Relatedly, also added Square-Root...   
   >   
   >   
   >   
   > My previous strategies for square-root didn't really work as effectively   
   > in this case, so just sorta fiddled with stuff until I got something   
   > that worked...   
   >   
   > Algorithm I came up with (to find sqrt(S)):   
   >    Make an initial guess of the square root, calling it C;   
   >      Make an initial guess for the reciprocal of C, calling it H;   
   >    Take a few passes (threading the needle, *1):   
   >      C[n+1]=C+(S-(C*c))*(H*0.375)   
   >    Redo approximate reciprocal of C, as H (*2);   
   >      Refine H: H=H*(2-C*H)   
   >    Enter main iteration pass:   
   >      C[+1]=C+(S-(C*c))*(H*0.5)   
   >      H[+1]=H*(2-C*H)  //(*3)   
   >   
   >   
   > *1: Usual "try to keep stuff from flying off into space" step, using a   
   > scale of 0.375 to undershoot convergence and increase stability (lower   
   > means more stability but slower convergence; closer to 0.5 means faster,   
   > but more likely to "fly off into space" depending on the accuracy of the   
   > initial guesses).   
   >   
   > *2: Seemed better to start over from a slightly better guess of C, than   
   > to directly iterate from the initial (much less accurate) guess.   
   >   
   > *3: Noting that if H is also converged, the convergence rate for C is   
   > significantly improved (the gains from faster C convergence are enough   
   > to offset the added cost of also converging H).   
   >   
   > Seems to be effective, though still slower than divide (which is still   
   > 23x slower than an ADD or MUL).   
      
   SQRT should be 20%-30% slower than DIV.   
      
   >   
   > In this case, the more complex algorithm being (ironically) partly   
   > justified by the comparably higher relative cost per operation (and the   
   > issue that I can't resort to tricks like handling the floating-point   
   > values as integers; doesn't work so hot with Decimal128).   
      
   If you have binary SQRT and a quick way from DFP128 to BFP32, take SQRT   
   in binary, convert back and do 2 iterations. Should be faster. {{I need   
   to remind some folks that {float; float; FDIV; fix} was faster than   
   IDIV on many 2st generation RISC machines.   
      
   > Felt curious, tried asking Grok about this, it identified this approach   
   > as the Goldschmidt Algorithm, OK. If so, kinda weird that I arrived at a   
   > well known (?) algorithm mostly by fiddling with it.   
      
   Feels like it is 1965--does it not ?!?   
      
   > Looking on Wikipedia though, this doesn't look like the same algorithm   
   > though.   
      
   Goldschmidt is just a N_R where the arithmetic has been arranged so   
   that multiplies are not data-dependent (like N-R). And for this   
   independence; GS lacks the automatic correction N-R has.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]