From: already5chosen@yahoo.com   
      
   On Tue, 4 Nov 2025 22:52:46 +0100   
   Terje Mathisen wrote:   
   >   
   > For the Intel binary mantissa dfp128 normalization is the hard issue,   
   > Michael S have figured out some really nice tricks to speed it up,   
      
   I remember that I played with that, but don't remember what I did   
   exactly. I dimly recollect that the fastest solution was relatively   
   straight-forward. It was trying to minimize the length of dependency   
   chains rather than total number of multiplications.   
   An important point here is that I played on relatively old x86-64   
   hardware. My solution is not necessarily optimal for newer hardware.   
   The differences between old and new are two-fold and they push   
   optimal solution into different directions.   
   1. Increase in throughput of integer multiplier   
   2. Decrease in latency of integer division   
      
   The first factor suggests even more intense push toward "eager"   
   solutions.   
      
   The second factor suggests, possibly, much simpler code, especially in   
   common case of division by 1 to 27 decimal digits (5**27 < 2**64).   
   How they say? Sometimes a division is just a division.   
      
   > but when you have a (worst case) temporary 220+ bit product mantissa,   
   > scaling is not that easy.   
   >   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|