home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.forth      Forth programmers eat a lot of Bratwurst      117,927 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 117,614 of 117,927   
   Anton Ertl to minforth   
   Re: 3dup again (1/2)   
   04 Oct 25 08:04:09   
   
   From: anton@mips.complang.tuwien.ac.at   
      
   minforth  writes:   
   >Am 03.10.2025 um 11:02 schrieb albert@spenarnc.xs4all.nl:   
   >> The problem with 3DUP is that it is actually used in context.   
   >> What is the data that is going to 3DUP ped? In my view this   
   >> amounts to double use of data that is in registers (32 in the riscv)   
   >> anyway, after an optimiser does his thing.   
   >>   
   >   
   >Code inlining will mend it.   
      
   Inlining is important for Forth, but it does not make what has been   
   called an "analytical optimizer" unnecessary; on the contraray,   
   inlining increases the benefit we get from the analytical optimizer.   
   E.g., let's consider   
      
   : 3dup.1 ( a b c -- a b c a b c ) >r 2dup r@ -rot r> ;   
   : 3dup.2 ( a b c -- a b c a b c ) 2 pick 2 pick 2 pick ;   
   : 3dup.3 {: a b c :} a b c a b c ;   
   : 3dup.4 ( a b c -- a b c a b c ) dup 2over rot ;   
      
   : foo.1 3dup.1 + ! ;   
   : foo.2 3dup.2 + ! ;   
   : foo.3 3dup.3 + ! ;   
   : foo.4 3dup.4 + ! ;   
      
   The result produced by VFX64 is:   
      
   foo.1              foo.2              foo.3              foo.4   
   PUSH EBX           MOV  EDX, EBX      CALL 3DUP.3        MOV  EDX, EBX   
   MOV  EBX, [ESP]    ADD  EBX, [EBP]    ADD  EBX, [EBP]    ADD  EBX, [EBP]   
   POP  EDX           MOV  ECX, [EBP+04] MOV  EDX, [EBP+04] MOV  ECX, [EBP+04]   
   ADD  EDX, [EBP]    MOV  0 [EBX], ECX  MOV  0 [EBX], EDX  MOV  0 [EBX], ECX   
   MOV  ECX, [EBP+04] MOV  EBX, EDX      MOV  EBX, [EBP+08] MOV  EBX, EDX   
   MOV  0 [EDX], ECX  NEXT,              LEA  EBP, [EBP+0C] NEXT,   
   NEXT,                                 NEXT,   
      
   VFX is only analytical about the data stack, and as a consequence, the   
   implementations of 3dup that only use the data stack work best.  When   
   the return stack is used, as in 3dup.1/foo.1, VFX produces   
   instructions (PUSH for >R, MOV ..., [ESP] for R@ and POP for R>) for   
   the return-stack operations.  When locals are used, VFX actually   
   disables inlining and just calls 3DUP.3.   
      
   Other Forth systems make too little use of inlining, and I have to   
   resort to macros to simulate it.  We cannot use proper macros for   
   3dup.3 (the locals-using variant), so I used EVALUATE-based macros;   
   this is just for experimental use, not for production, don't do this   
   at home:-)   
      
   Let's see what VFX64 produces for FOO.3 with this:   
      
   FOO.3   
   ( 080C0C50    8BD4 )                  MOV       EDX, ESP   
   ( 080C0C52    FF7504 )                PUSH      [EBP+04]   
   ( 080C0C55    FF7500 )                PUSH      [EBP]   
   ( 080C0C58    53 )                    PUSH      EBX   
   ( 080C0C59    52 )                    PUSH      EDX   
   ( 080C0C5A    57 )                    PUSH      EDI   
   ( 080C0C5B    8BFC )                  MOV       EDI, ESP   
   ( 080C0C5D    81EC00000000 )          SUB       ESP, 00000000   
   ( 080C0C63    8B5D08 )                MOV       EBX, [EBP+08]   
   ( 080C0C66    8D6D0C )                LEA       EBP, [EBP+0C]   
   ( 080C0C69    8B5708 )                MOV       EDX, [EDI+08]   
   ( 080C0C6C    03570C )                ADD       EDX, [EDI+0C]   
   ( 080C0C6F    8B4F08 )                MOV       ECX, [EDI+08]   
   ( 080C0C72    8B470C )                MOV       EAX, [EDI+0C]   
   ( 080C0C75    8D6DEC )                LEA       EBP, [EBP+-14]   
   ( 080C0C78    894D04 )                MOV       [EBP+04], ECX   
   ( 080C0C7B    894508 )                MOV       [EBP+08], EAX   
   ( 080C0C7E    8B4F10 )                MOV       ECX, [EDI+10]   
   ( 080C0C81    894D0C )                MOV       [EBP+0C], ECX   
   ( 080C0C84    895D10 )                MOV       [EBP+10], EBX   
   ( 080C0C87    8BDA )                  MOV       EBX, EDX   
   ( 080C0C89    8B5710 )                MOV       EDX, [EDI+10]   
   ( 080C0C8C    895500 )                MOV       [EBP], EDX   
   ( 080C0C8F    8B5500 )                MOV       EDX, [EBP]   
   ( 080C0C92    8913 )                  MOV       0 [EBX], EDX   
   ( 080C0C94    8B5D04 )                MOV       EBX, [EBP+04]   
   ( 080C0C97    8D6D08 )                LEA       EBP, [EBP+08]   
   ( 080C0C9A    8B6704 )                MOV       ESP, [EDI+04]   
   ( 080C0C9D    8B3F )                  MOV       EDI, 0 [EDI]   
   ( 080C0C9F    C3 )                    NEXT,   
      
   So inlining did not mend that.   
      
   Here's what lxf produces:   
      
   foo.1              foo.2              foo.3              foo.4   
   mov eax , ebx      mov eax , ebx      mov eax , ebx      mov eax , ebx   
   add eax , [ebp]    add eax , [ebp]    add eax , [ebp]    add eax , [ebp]   
   mov ecx , [ebp+4h] mov ecx , [ebp+4h] mov ecx , [ebp+4h] mov ecx , [ebp+4h]   
   mov [eax] , ecx    mov [eax] , ecx    mov [eax] , ecx    mov [eax] , ecx   
   ret near           ret near           ret near           ret near   
      
   So, because lxf is analytical about the return stack (and, through   
   that, about locals), inlining produces the same very good code in all   
   these cases.   
      
   You may notice that lxf produces a register-register move less than   
   VFX does for FOO.2/FOO.4.  That's because VFX decided to modify the   
   TOS register (and has to restore it later), whereas lxf decided to   
   modify a copy of that register.  One would have to make additional   
   observations to determine if lxf was just lucky here or if it   
   consistently makes the right decision in such cases.   
      
   And here's the code that gforth-fast (which does not have an   
   analytical optimizer) produces:   
      
   foo.1              foo.2              foo.3                foo.4   
   >r    1->0         third    1->1      >l >l 1->1           dup    1->1   
     mov -8[r14],r13    mov [r10],r13    >l    1->1             mov [r10],r13   
     sub r14,$08        sub r10,$08        mov -$08[rbp],r13    sub r10,$08   
   2dup    0->2         mov r13,$18[r10]   mov rdx,$08[r10]   2over    1->3   
     mov r13,$10[r10] third    1->2        mov rax,rbp          mov r15,$18[r10]   
     mov r15,$08[r10]   mov r15,$10[r10]   add r10,$10          mov r9,$10[r10]   
   i    2->3          third    2->3        lea rbp,-$10[rbp]  rot    3->3   
     mov r9,[r14]       mov r9,$08[r10]    mov -$10[rax],rdx    mov rax,r13   
   -rot    3->2       +    3->2            mov r13,[r10]        mov r13,r15   
     mov [r10],r9       add r15,r9       >l @local0 1->1        mov r15,r9   
     sub r10,$08      !    2->0          @local0    1->1        mov r9,rax   
   r>    2->3           mov [r15],r13      mov rax,rbp        +    3->2   
     mov r9,[r14]     ;s    0->1           lea rbp,-$08[rbp]    add r15,r9   
     add r14,$08        mov r13,$08[r10]   mov -$08[rax],r13  !    2->0   
   +    3->2            add r10,$08      @local1    1->2        mov [r15],r13   
     add r15,r9         mov rbx,[r14]      mov r15,$08[rbp]   ;s    0->1   
   !    2->0            add r14,$08      @local2    2->3        mov r13,$08[r10]   
     mov [r15],r13      mov rax,[rbx]      mov r9,$10[rbp]      add r10,$08   
   s    0->1           jmp eax          @local0    3->1        mov rbx,[r14]   
     mov r13,$08[r10]                      mov -$10[r10],r9     add r14,$08   
     add r10,$08                           sub r10,$18          mov rax,[rbx]   
     mov rbx,[r14]                         mov $10[r10],r15     jmp eax   
      
   [continued in next message]   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca