From: krishna.myneni@ccreweb.org   
      
   On 5/22/24 11:31, Anton Ertl wrote:   
   > Krishna Myneni writes:   
   >> On 5/21/24 04:03, mhx wrote:   
   >>> Anton Ertl wrote:   
   >>>   
   >>> [..]   
   >>>> It seems to me that this can be solved by sorting the three factors   
   >>>> into a>b>c. Then you can avoid the intermediate overflow by   
   >>>> performing the computation as (a*c)*b.   
   > ...   
   >> Remember that you will also have to deal with IEEE 754 special values   
   >> like Inf and NaN.   
   >   
   > Not a problem. If any operand is a NaN, the result will be NaN no   
   > matter how the operations are associated. For infinities (and 0 as   
   > divisor), I would analyse it by looking at all cases, but I don't see   
   > that it makes any difference:   
   >   
   > Variable names here represent finite non-zero values:   
   >   
   > (inf*y)/z=inf/z=inf   
   > inf*(y/z)=inf*finite=inf   
   > y*(inf/z)=y*inf=inf   
   >   
   > Likewise if x is finite and y is infinite   
   >   
   > (x*y)/inf=finite/inf=0   
   > x*(y/inf)=x*0=0   
   > y*(x/inf)=y*0=0   
   >   
   > (x*y)/0=finite/0=inf   
   > x*(y/0)=x*inf=inf   
   > y*(x/0)=y*inf=inf   
   >   
   > Signs in all these cases follow the same rules whether infinities are   
   > involved or not.   
   >   
   >> It will be interesting to compare the efficiency of   
   >> both my approach and your sorting approach. I'm skeptical that the   
   >> additional sorting will make the equivalent calculation faster.   
   >   
   > Actually sorting is overkill:   
   >   
   > : fsort2 ( r1 r2 -- r3 r4 )   
   > \ |r3|>=|r4|   
   > fover fabs fover fabs f< if   
   > fswap   
   > then ;   
   >   
   > : f*/ ( r1 r2 r3 -- r )   
   > fdup fabs 1e f> fswap frot fsort2 if   
   > fswap then   
   > frot f/ f* ;   
   >   
   > I have tested this with your tests from   
   > , but needed to change rel-near (I   
   > changed it to 1e-16) for gforth to pass your tests. I leave   
   > performance testing to you. Here's what vfx64 produces for this F*/:   
   >   
   > see f*/   
   > F*/   
   > ( 0050A310 D9C0 ) FLD ST   
   > ( 0050A312 D9E1 ) FABS   
   > ( 0050A314 D9E8 ) FLD1   
   > ( 0050A316 E8F5BEFFFF ) CALL 00506210 F>   
   > ( 0050A31B D9C9 ) FXCH ST(1)   
   > ( 0050A31D D9C9 ) FXCH ST(1)   
   > ( 0050A31F D9CA ) FXCH ST(2)   
   > ( 0050A321 E88AFFFFFF ) CALL 0050A2B0 FSORT2   
   > ( 0050A326 4885DB ) TEST RBX, RBX   
   > ( 0050A329 488B5D00 ) MOV RBX, [RBP]   
   > ( 0050A32D 488D6D08 ) LEA RBP, [RBP+08]   
   > ( 0050A331 0F8402000000 ) JZ/E 0050A339   
   > ( 0050A337 D9C9 ) FXCH ST(1)   
   > ( 0050A339 D9C9 ) FXCH ST(1)   
   > ( 0050A33B D9CA ) FXCH ST(2)   
   > ( 0050A33D DEF9 ) FDIVP ST(1), ST   
   > ( 0050A33F DEC9 ) FMULP ST(1), ST   
   > ( 0050A341 C3 ) RET/NEXT   
   > ( 50 bytes, 18 instructions )   
   > ok   
   > see fsort2   
   > FSORT2   
   > ( 0050A2B0 D9C1 ) FLD ST(1)   
   > ( 0050A2B2 D9E1 ) FABS   
   > ( 0050A2B4 D9C1 ) FLD ST(1)   
   > ( 0050A2B6 D9E1 ) FABS   
   > ( 0050A2B8 E863BEFFFF ) CALL 00506120 F<   
   > ( 0050A2BD 4885DB ) TEST RBX, RBX   
   > ( 0050A2C0 488B5D00 ) MOV RBX, [RBP]   
   > ( 0050A2C4 488D6D08 ) LEA RBP, [RBP+08]   
   > ( 0050A2C8 0F8402000000 ) JZ/E 0050A2D0   
   > ( 0050A2CE D9C9 ) FXCH ST(1)   
   > ( 0050A2D0 C3 ) RET/NEXT   
   > ( 33 bytes, 11 instructions )   
   > ok   
   > see f<   
   > F<   
   > ( 00506120 E86BFEFFFF ) CALL 00505F90 FCMP2   
   > ( 00506125 4881FB00010000 ) CMP RBX, # 00000100   
   > ( 0050612C 0F94C3 ) SETZ/E BL   
   > ( 0050612F F6DB ) NEG BL   
   > ( 00506131 480FBEDB ) MOVSX RBX, BL   
   > ( 00506135 C3 ) RET/NEXT   
   > ( 22 bytes, 6 instructions )   
   > ok   
   > see fcmp2   
   > FCMP2   
   > ( 00505F90 4883ED08 ) SUB RBP, # 08   
   > ( 00505F94 48895D00 ) MOV [RBP], RBX   
   > ( 00505F98 D9C9 ) FXCH ST(1)   
   > ( 00505F9A DED9 ) FCOMPP   
   > ( 00505F9C 9B ) FWAIT   
   > ( 00505F9D DFE0 ) FSTSW AX   
   > ( 00505F9F 66250041 ) AND AX, # 4100   
   > ( 00505FA3 480FB7D8 ) MOVZX RBX, AX   
   > ( 00505FA7 C3 ) RET/NEXT   
   > ( 24 bytes, 9 instructions )   
   >   
      
      
   Nice work. I'll try some comparisons using your Forth implementation of   
   F*/ .   
      
   Interesting that there is an FWAIT instruction assembled into FCMP2.   
   IIRC, FWAIT was needed to fix a bug in early FPUs.   
      
   --   
   Krishna   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|