From: krishna.myneni@ccreweb.org   
      
   On 5/22/24 11:31, Anton Ertl wrote:   
   > Krishna Myneni writes:   
   >> On 5/21/24 04:03, mhx wrote:   
   >>> Anton Ertl wrote:   
   >>>   
   >>> [..]   
   >>>> It seems to me that this can be solved by sorting the three factors   
   >>>> into a>b>c. Then you can avoid the intermediate overflow by   
   >>>> performing the computation as (a*c)*b.   
   > ...   
   >> Remember that you will also have to deal with IEEE 754 special values   
   >> like Inf and NaN.   
   >   
   > Not a problem. If any operand is a NaN, the result will be NaN no   
   > matter how the operations are associated. For infinities (and 0 as   
   > divisor), I would analyse it by looking at all cases, but I don't see   
   > that it makes any difference:   
   >   
   > Variable names here represent finite non-zero values:   
   >   
   > (inf*y)/z=inf/z=inf   
   > inf*(y/z)=inf*finite=inf   
   > y*(inf/z)=y*inf=inf   
   >   
   > Likewise if x is finite and y is infinite   
   >   
   > (x*y)/inf=finite/inf=0   
   > x*(y/inf)=x*0=0   
   > y*(x/inf)=y*0=0   
   >   
   > (x*y)/0=finite/0=inf   
   > x*(y/0)=x*inf=inf   
   > y*(x/0)=y*inf=inf   
   >   
   > Signs in all these cases follow the same rules whether infinities are   
   > involved or not.   
   >   
   >> It will be interesting to compare the efficiency of   
   >> both my approach and your sorting approach. I'm skeptical that the   
   >> additional sorting will make the equivalent calculation faster.   
   >   
   > Actually sorting is overkill:   
   >   
   > : fsort2 ( r1 r2 -- r3 r4 )   
   > \ |r3|>=|r4|   
   > fover fabs fover fabs f< if   
   > fswap   
   > then ;   
   >   
   > : f*/ ( r1 r2 r3 -- r )   
   > fdup fabs 1e f> fswap frot fsort2 if   
   > fswap then   
   > frot f/ f* ;   
   >   
   > I have tested this with your tests from   
   > , but needed to change rel-near (I   
   > changed it to 1e-16) for gforth to pass your tests. I leave   
   > performance testing to you. ...   
      
   I put together a benchmark test. My implementation of F*/ hangs during   
   the test, so I have some debugging to do on that. Your implementation   
   above does execute.   
      
   --   
   Krishna   
      
      
   === begin bench-fstarslash.4th ===   
   \ bench-fstarslash.4th   
   \   
   \ Benchmark implementation of double-precision F*/ using   
   \ a large sample of triplet dp floating-point numbers,   
   \ which would overflow/underflow with F* F/ sequence.   
   \   
      
   include ans-words \ only for kForth   
   include random   
   include fsl/fsl-util   
   include fsl/dynmem   
      
   1 [IF]   
      
   \ Anton Ertl's implementation of F*/ per discussion on   
   \ comp.lang.forth: 5/22/2024, "Re: F*/ (f-star-slash)"   
      
   : fsort2 ( F: r1 r2 -- r3 r4 )   
    \ |r3|>=|r4|   
    fover fabs fover fabs f< if   
    fswap   
    then ;   
      
   : f*/ ( F: r1 r2 r3 -- r )   
    fdup fabs 1e f> fswap frot fsort2 if   
    fswap then   
    frot f/ f* ;   
      
   [ELSE]   
   include fstarslash \ other implementation   
   [THEN]   
      
   \ Generate triplet:   
   \   
   \ 1. Use a integer random number generator to generate   
   \ binary signed exponents, e1, e2, e3, in the interval   
   \   
   \ [-DP_EXPONENT_BIAS, MAX_DP_EXPONENT-DP_EXPONENT_BIAS)   
   \   
   \ 2. Obtain the triplet, 2^{e1--e3}, such that it is in   
   \ the normal range for double-precision fp.   
   \   
   \ 3. Store in matrix and repeat from 1 for N_SAMPLES.   
   \   
   \ 4. Benchmark results of F*/ on triplets; store in   
   \ results array.   
   \   
      
   1014678321 seed !   
      
   [UNDEFINED] DP_EXPONENT_MAX [IF]   
   2047 constant DP_EXPONENT_MAX   
   [THEN]   
      
   DP_EXPONENT_MAX constant RAND_EXP_MASK   
      
   [UNDEFINED] DP_EXPONENT_BIAS [IF]   
   1023 constant DP_EXPONENT_BIAS   
   [THEN]   
      
   1000000 constant N_SAMPLES   
      
   FLOAT DMATRIX triplets{{   
   FLOAT DARRAY results{   
      
   : alloc-mem ( u -- bFail )   
    & triplets{{ over 3 }}malloc   
    & results{ swap }malloc   
    malloc-fail? ;   
      
   : free-mem ( -- )   
    & results{ }free   
    & triplets{{ }}free   
      
      
   \ Return signed binary exponent   
   : rand-exp ( -- e )   
    random2 RAND_EXP_MASK and \ ensure e stays within range   
    DP_EXPONENT_BIAS - ;   
      
   : normal-exponent? ( e -- bNormal )   
    DP_EXPONENT_BIAS negate DP_EXPONENT_BIAS within ;   
      
   : random-exponents ( -- e1 e2 e3 )   
    BEGIN   
    rand-exp rand-exp   
    2dup + normal-exponent?   
    WHILE   
    2drop   
    REPEAT   
    \ -- e1 e2   
    \ (2^e1)*(2^e2) will underflow or overflow double fp under F*   
    BEGIN   
    2dup +   
    rand-exp 2dup -   
    normal-exponent? 0=   
    WHILE   
    2drop   
    REPEAT   
    nip   
      
      
   : randomize-sign ( F: r -- +/-r )   
    random2 1 and IF fnegate THEN ;   
      
   : gen-triplet ( F: -- r1 r2 r3 )   
    random-exponents   
    2.0e0 s>f f** randomize-sign   
    2.0e0 s>f f** randomize-sign   
    2.0e0 s>f f** randomize-sign ;   
      
   : gen-samples ( -- )   
    N_SAMPLES alloc-mem ABORT" Unable to allocate memory!"   
    N_SAMPLES 0 DO   
    gen-triplet   
    triplets{{ I 2 }} f! triplets{{ I 1 }} f! triplets{{ I 0 }} f!   
    key? IF key drop I . cr unloop EXIT THEN   
    LOOP ;   
      
   \ For Gforth, include kforth-compat.fs, or replace MS@ with UTIME   
   \ and double arithmetic to obtain elapsed time.   
      
   : bench-f*/ ( -- )   
    ms@   
    results{ 0 }   
    N_SAMPLES 0 DO   
    triplets{{ I 0 }}   
    dup f@   
    float+ dup f@   
    float+ f@   
    f*/   
    dup f! float+   
    LOOP   
    drop   
    ms@ swap - . ." ms" cr ;   
      
   cr   
   cr .( Type 'gen-samples' to obtain arithmetic cases for F*/' )   
   cr .( Type bench-f*/ to execute the benchmark. )   
   cr   
   cr   
   === end bench-fstarslash.4th ===   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|