... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.forth
Forth programmers eat a lot of Bratwurst
117,927 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 116,409 of 117,927
Krishna Myneni to Anton Ertl
Re: Floating point implementations on AM
14 Apr 24 18:32:11
   From: krishna.myneni@ccreweb.org   
      
   On 4/14/24 10:19, Anton Ertl wrote:   
   > Krishna Myneni  writes:   
   >> dx/dt = sigma*(y - x)   
   >> dy/dt = x*(rho -z) - y   
   >> dz/dt = x*y - beta*z   
   >>   
   >> where sigma, rho, and beta are constant parameters.   
   >>   
   >> Let's say we want to write a word DERIVS which computes and stores the   
   >> derivatives, given the instantaneous values of x, y, z. This is the   
   >> basis for any numerical code which solves the trajectory in time,   
   >> starting from an initial condition.   
   >>   
   >> DERIVS ( F: x y z -- )   
   >>   
   >> Hence, we want to place some values x, y, and z onto the fp stack and   
   >> compute the three derivatives. Ideally these three values remain on the   
   >> fp stack and don't need to be fetched from memory constantly until the   
   >> three derivatives are computed, especially if one is using the hardware   
   >> fp stack. We allow the constant parameters to be fetched from memory and   
   >> the results of the derivative computation to be stored to memory so they   
   >> don't overflow the stack. This should be doable with the 8-element   
   >> hardware fp stack.   
   >   
   > I have adapted your Forth code:   
   >   
   > [UNDEFINED] F2OVER [IF]   
   > : f2over ( F: r1 r2 r3 r4 -- r1 r2 r3 r4 r1 r2 )   
   >       3 fpick 3 fpick ;   
   > [THEN]   
   >   
   > 16.0e0  fconstant sigma   
   > 45.92e0 fconstant rho   
   > 4.0e0   fconstant beta   
   >   
   > fvariable dx/dt   
   > fvariable dy/dt   
   > fvariable dz/dt   
   >   
   > : derivs ( F: x y z -- )   
   >       fdup f2over   \ F: x y z z x y   
   >       f- sigma f* fnegate   
   >       dx/dt f!  \ F: x y z z   
   >       rho fover f-  \ F: x y z z rho-z   
   >       4 fpick f*    \ F: x y z z x*(rho - z)   
   >       3 fpick f-   
   >       dy/dt f!  \ F: x y z z   
   >       fdrop   
   >       beta f* fnegate   
   >       frot frot f* f+ dz/dt f!   
   > ;   
   >   
   > 0.1e 0.6e 4.0e derivs   
   > dx/dt f@ f. cr \ 8.   
   > dy/dt f@ f. cr \ 3.592   
   > dz/dt f@ f. cr \ -15.94   
   >   
   > In particular, I eliminated the additional memory accesses to DZ/DT.   
   >   
      
   Nice. FROT FROT is expensive on a memory based FP stack, unless it is   
   optimized by the compiler, but for fpu stack use it's probably very   
   fast. I see that VFX Forth and iforth use a series of FXCH instructions   
   to implement FROT FROT.   
      
   > SwiftForth, VFX and iforth produce the expected results for your test   
   > case.  The code is:   
   >   
   > SwiftForth 4.0.0-RC87  VFX Forth 64 5.43          iforth-5.1-mini   
   > ST(0) FLD              FLD   ST                   fld   ST(0)   
   > 44E8BC ( f2over ) CALL CALL  0050A080  F2OVER     fld   [r13 0 +] tbyte   
   > ST(0) ST(1) FSUBP      FSUBP ST(1), ST            fxch  ST(1)   
   > 44E8FB ( sigma ) CALL  CALL  0050A2BB  SIGMA      fld   [r13 #16 +] tby   
   > ST(0) ST(1) FMULP      FMULP ST(1), ST            lea  r13, [r13 #32 +]   
   > FCHS                   FCHS                       fxch  ST(3)   
   > -8 [RBP] RBP LEA       FSTP  TBYTE FFF9CFE8 [RIP] fxch  ST(1)   
   > RBX 0 [RBP] MOV        CALL  0050A2FB  RHO        fld   ST(3)   
   > 4C508 [RDI] RBX LEA    FLD   ST(1)                fld   ST(3)   
   > 0 [RBX] TBYTE FSTP     FSUBP ST(1), ST            fsubp ST(1), ST   
   > 0 [RBP] RBX MOV        LEA   RBP, [RBP+-08]       fld   $101BC720 tbyte   
   > 8 [RBP] RBP LEA        MOV   [RBP], RBX           fmulp ST(1), ST   
   > 44E923 ( rho ) CALL    MOV   EBX, # 00000004      fchs   
   > ST(1) FLD              CALL  005030C0  FPICK      fstp  $10226470 tbyte   
   > ST(0) ST(1) FSUBP      FMULP ST(1), ST            fld   $101BC710 tbyte   
   > -8 [RBP] RBP LEA       LEA   RBP, [RBP+-08]       fld   ST(1)   
   > RBX 0 [RBP] MOV        MOV   [RBP], RBX           fsubp ST(1), ST   
   > 4 # EBX MOV            MOV   EBX, # 00000003      fld   ST(4)   
   > 43C901 ( FPICK ) CALL  CALL  005030C0  FPICK      fmulp ST(1), ST   
   > ST(0) ST(1) FMULP      FSUBP ST(1), ST            fld   ST(3)   
   > -8 [RBP] RBP LEA       FSTP  TBYTE FFF9CFC1 [RIP] fsubp ST(1), ST   
   > RBX 0 [RBP] MOV        FSTP  ST                   fstp  $10226490 tbyte   
   > 3 # EBX MOV            CALL  0050A33B  BETA       ffreep ST(0)   
   > 43C901 ( FPICK ) CALL  FMULP ST(1), ST            fld   $101BC700 tbyte   
   > ST(0) ST(1) FSUBP      FCHS                       fmulp ST(1), ST   
   > -8 [RBP] RBP LEA       FXCH  ST(1)                fchs   
   > RBX 0 [RBP] MOV        FXCH  ST(2)                fxch  ST(1)   
   > 4C530 [RDI] RBX LEA    FXCH  ST(1)                fxch  ST(2)   
   > 0 [RBX] TBYTE FSTP     FXCH  ST(2)                fxch  ST(1)   
   > 0 [RBP] RBX MOV        FMULP ST(1), ST            fxch  ST(2)   
   > 8 [RBP] RBP LEA        FADDP ST(1), ST            fmulp ST(1), ST   
   > ST(0) FSTP             FSTP  TBYTE FFF9CFB4 [RIP] fxch  ST(1)   
   > 44E94B ( beta ) CALL   RET/NEXT                   fpopswap,   
   > ST(0) ST(1) FMULP                                 faddp ST(1), ST   
   > FCHS                                              fstp  $102264B0 tbyte   
   > 43C807 ( FROT ) CALL                              ;   
   > 43C807 ( FROT ) CALL   
   > ST(0) ST(1) FMULP   
   > ST(0) ST(1) FADDP   
   > -8 [RBP] RBP LEA   
   > RBX 0 [RBP] MOV   
   > 4C558 [RDI] RBX LEA   
   > 0 [RBX] TBYTE FSTP   
   > 0 [RBP] RBX MOV   
   > 8 [RBP] RBP LEA   
   > RET   
   >   
   > FPICK is apparently implemented on SwiftForth and VFX through an   
   > indirect branch that branches to one of 8 variants of "FLD ST(...)",   
   > while iForth manages to resolve this during compilation.   
   >   
      
   Good to see that x, y, z are not repeatedly fetched from memory.   
      
   For this example, the hardware fpu stack is sufficient. But, it's easy   
   to see that the benefits of a hardware-only stack would diminish quickly   
   as the size of the problem increased a small amount, and then the   
   programmer (or compiler) would have to keep careful track of how many   
   fpu registers are used.   
      
   > I have also looked at VFX 5.11 which uses XMM registers instead of the   
   > FP stack, but it does not inline FP operations, so you mostly see a long   
   > sequence of calls.   
   >   
      
   --   
   Krishna   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]