From: anton@mips.complang.tuwien.ac.at   
      
   dxf writes:   
   >For 3DUP I believe this is the one to beat:   
   >   
   >: 3DUP ( a b c -- a b c a b c ) dup 2over rot ;   
   >   
   >With NTF/LFX the locals version will break even.   
      
   As we already discussed in the thread including   
   <2021Sep11.083507@mips.complang.tuwien.ac.at>, NTF/LXF produces the   
   same (optimal for the calling convention used by NTF/LXF) code for   
   3DUP versions using the data stack, return stack, or locals. That's   
   because the actual data flow is always the same, and NTF/LXF can see   
   this data flow in all three cases.   
      
   >For others, well, it may   
   >be better not to look. For a straight-forward example of 'stack juggling',   
   >locals handle it rather poorly.   
      
   Other Forth systems implement locals poorly. LXF/NTF demonstrates   
   that this is not due to some natural law, however.   
      
   There have been some improvements in Gforth since that time. Let's   
   see how the versions used in that thread look on today's gforth-fast.   
   Here are the versions of 3DUP:   
      
   : 3dup.1 ( a b c -- a b c a b c ) >r 2dup r@ -rot r> ;   
   : 3dup.2 ( a b c -- a b c a b c ) 2 pick 2 pick 2 pick ;   
   : 3dup.3 {: a b c :} a b c a b c ;   
   : 3dup.4 ( a b c -- a b c a b c ) dup 2over rot ;   
      
   And here's the gforth-fast code on AMD64:   
      
   3dup.1 3dup.2 3dup.3 3dup.4   
   >r 1->0 third 1->2 >l >l 1->1 dup 1->1   
    mov -$08[r14],r13 mov r15,$10[r10] >l 1->1 mov [r10],r13   
    sub r14,$08 third 2->3 mov -$08[rbp],r13 sub r10,$08   
   2dup 0->2 mov r9,$08[r10] mov rdx,$08[r10] 2over 1->3   
    mov r13,$10[r10] third 3->1 mov rax,rbp mov r15,$18[r10   
    mov r15,$08[r10] mov [r10],r13 add r10,$10 mov r9,$10[r10]   
   i 2->3 sub r10,$18 lea rbp,-$10[rbp] rot 3->1   
    mov r9,[r14] mov $10[r10],r15 mov -$10[rax],rdx mov [r10],r15   
   -rot 3->2 mov $08[r10],r9 mov r13,[r10] sub r10,$10   
    mov [r10],r9 ;s 1->1 >l @local0 1->1 mov $08[r10],r9   
    sub r10,$08 mov rbx,[r14] @local0 1->1 ;s 1->1   
   r> 2->1 add r14,$08 mov rax,rbp mov rbx,[r14]   
    mov -$08[r10],r15 mov rax,[rbx] lea rbp,-$08[rbp] add r14,$08   
    sub r10,$10 jmp eax mov -$08[rax],r13 mov rax,[rbx]   
    mov $10[r10],r13 @local1 1->2 jmp eax   
    mov r13,[r14] mov r15,$08[rbp]   
    add r14,$08 @local2 2->1   
   s 1->1 mov -$08[r10],r15   
    mov rbx,[r14] sub r10,$10   
    add r14,$08 mov $10[r10],r13   
    mov rax,[rbx] mov r13,$10[rbp]   
    jmp eax @local0 1->2   
    mov r15,$00[rbp]   
    @local1 2->3   
    mov r9,$08[rbp]   
    @local2 3->1   
    mov -$10[r10],r9   
    sub r10,$18   
    mov $10[r10],r15   
    mov $18[r10],r13   
    mov r13,$10[rbp]   
    lit 1->2   
    #24   
    mov r15,$50[rbx]   
    lp+! 2->1   
    add rbp,r15   
    ;s 1->1   
    mov rbx,[r14]   
    add r14,$08   
    mov rax,[rbx]   
    jmp eax   
      
   Locals-haters, come to Gforth, where locals are implemented   
   inefficiently:-). The code for 3DUP.2 is actually optimal for   
   Gforth's calling convention.   
      
   - anton   
   --   
   M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html   
   comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html   
    New standard: https://forth-standard.org/   
   EuroForth 2025 CFP: http://www.euroforth.org/ef25/cfp.html   
   EuroForth 2025 registration: https://euro.theforth.net/   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|