... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.forth
Forth programmers eat a lot of Bratwurst
117,927 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 116,496 of 117,927
Krishna Myneni to Anton Ertl
Re: F*/ (f-star-slash)
27 May 24 13:45:38
   From: krishna.myneni@ccreweb.org   
      
   On 5/22/24 11:31, Anton Ertl wrote:   
   > Krishna Myneni  writes:   
   >> On 5/21/24 04:03, mhx wrote:   
   >>> Anton Ertl wrote:   
   >>>   
   >>> [..]   
   >>>> It seems to me that this can be solved by sorting the three factors   
   >>>> into a>b>c.  Then you can avoid the intermediate overflow by   
   >>>> performing the computation as (a*c)*b.   
   > ...   
   >> Remember that you will also have to deal with IEEE 754 special values   
   >> like Inf and NaN.   
   >   
   > Not a problem.  If any operand is a NaN, the result will be NaN no   
   > matter how the operations are associated.  For infinities (and 0 as   
   > divisor), I would analyse it by looking at all cases, but I don't see   
   > that it makes any difference:   
   >   
   > Variable names here represent finite non-zero values:   
   >   
   > (inf*y)/z=inf/z=inf   
   > inf*(y/z)=inf*finite=inf   
   > y*(inf/z)=y*inf=inf   
   >   
   > Likewise if x is finite and y is infinite   
   >   
   > (x*y)/inf=finite/inf=0   
   > x*(y/inf)=x*0=0   
   > y*(x/inf)=y*0=0   
   >   
   > (x*y)/0=finite/0=inf   
   > x*(y/0)=x*inf=inf   
   > y*(x/0)=y*inf=inf   
   >   
   > Signs in all these cases follow the same rules whether infinities are   
   > involved or not.   
   >   
   >> It will be interesting to compare the efficiency of   
   >> both my approach and your sorting approach. I'm skeptical that the   
   >> additional sorting will make the equivalent calculation faster.   
   >   
   > Actually sorting is overkill:   
   >   
   > : fsort2 ( r1 r2 -- r3 r4 )   
   >      \ |r3|>=|r4|   
   >      fover fabs fover fabs f< if   
   >          fswap   
   >      then ;   
   >   
   > : f*/ ( r1 r2 r3 -- r )   
   >      fdup fabs 1e f> fswap frot fsort2 if   
   >          fswap then   
   >      frot f/ f* ;   
   >   
   > I have tested this with your tests from   
   > , but needed to change rel-near (I   
   > changed it to 1e-16) for gforth to pass your tests.  I leave   
   > performance testing to you.  Here's what vfx64 produces for this F*/:   
   >   
   > see f*/   
   > F*/   
   > ( 0050A310    D9C0 )                  FLD     ST   
   > ( 0050A312    D9E1 )                  FABS   
   > ( 0050A314    D9E8 )                  FLD1   
   > ( 0050A316    E8F5BEFFFF )            CALL    00506210  F>   
   > ( 0050A31B    D9C9 )                  FXCH    ST(1)   
   > ( 0050A31D    D9C9 )                  FXCH    ST(1)   
   > ( 0050A31F    D9CA )                  FXCH    ST(2)   
   > ( 0050A321    E88AFFFFFF )            CALL    0050A2B0  FSORT2   
   > ( 0050A326    4885DB )                TEST    RBX, RBX   
   > ( 0050A329    488B5D00 )              MOV     RBX, [RBP]   
   > ( 0050A32D    488D6D08 )              LEA     RBP, [RBP+08]   
   > ( 0050A331    0F8402000000 )          JZ/E    0050A339   
   > ( 0050A337    D9C9 )                  FXCH    ST(1)   
   > ( 0050A339    D9C9 )                  FXCH    ST(1)   
   > ( 0050A33B    D9CA )                  FXCH    ST(2)   
   > ( 0050A33D    DEF9 )                  FDIVP   ST(1), ST   
   > ( 0050A33F    DEC9 )                  FMULP   ST(1), ST   
   > ( 0050A341    C3 )                    RET/NEXT   
   > ( 50 bytes, 18 instructions )   
   >   ok   
   > see fsort2   
   > FSORT2   
   > ( 0050A2B0    D9C1 )                  FLD     ST(1)   
   > ( 0050A2B2    D9E1 )                  FABS   
   > ( 0050A2B4    D9C1 )                  FLD     ST(1)   
   > ( 0050A2B6    D9E1 )                  FABS   
   > ( 0050A2B8    E863BEFFFF )            CALL    00506120  F<   
   > ( 0050A2BD    4885DB )                TEST    RBX, RBX   
   > ( 0050A2C0    488B5D00 )              MOV     RBX, [RBP]   
   > ( 0050A2C4    488D6D08 )              LEA     RBP, [RBP+08]   
   > ( 0050A2C8    0F8402000000 )          JZ/E    0050A2D0   
   > ( 0050A2CE    D9C9 )                  FXCH    ST(1)   
   > ( 0050A2D0    C3 )                    RET/NEXT   
   > ( 33 bytes, 11 instructions )   
   >   ok   
   > see f<   
   > F<   
   > ( 00506120    E86BFEFFFF )            CALL    00505F90  FCMP2   
   > ( 00506125    4881FB00010000 )        CMP     RBX, # 00000100   
   > ( 0050612C    0F94C3 )                SETZ/E   BL   
   > ( 0050612F    F6DB )                  NEG     BL   
   > ( 00506131    480FBEDB )              MOVSX   RBX, BL   
   > ( 00506135    C3 )                    RET/NEXT   
   > ( 22 bytes, 6 instructions )   
   >   ok   
   > see fcmp2   
   > FCMP2   
   > ( 00505F90    4883ED08 )              SUB     RBP, # 08   
   > ( 00505F94    48895D00 )              MOV     [RBP], RBX   
   > ( 00505F98    D9C9 )                  FXCH    ST(1)   
   > ( 00505F9A    DED9 )                  FCOMPP   
   > ( 00505F9C    9B )                    FWAIT   
   > ( 00505F9D    DFE0 )                  FSTSW   AX   
   > ( 00505F9F    66250041 )              AND     AX, # 4100   
   > ( 00505FA3    480FB7D8 )              MOVZX   RBX, AX   
   > ( 00505FA7    C3 )                    RET/NEXT   
   > ( 24 bytes, 9 instructions )   
   >   
      
      
   Nice work. I'll try some comparisons using your Forth implementation of   
   F*/ .   
      
   Interesting that there is an FWAIT instruction assembled into FCMP2.   
   IIRC, FWAIT was needed to fix a bug in early FPUs.   
      
   --   
   Krishna   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]