Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.forth    |    Forth programmers eat a lot of Bratwurst    |    117,927 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 117,105 of 117,927    |
|    Anton Ertl to Anton Ertl    |
|    Stack caching (: Stack vs stackless oper    |
|    01 Mar 25 08:18:06    |
      From: anton@mips.complang.tuwien.ac.at              anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:       >see-code exchange see-code exchange4 see-code exchange2       >over 1->2 dup 1->2 dup >r 1->1       > mov r15,$08[r12] mov r15,r8 >r 1->1       >@ 2->2 @ 2->2 mov -$08[r13],r8       > mov r15,[r15] mov r15,[r15] sub r13,$08       >swap 2->3 rot 2->3 @ 1->1       > add r12,$08 mov r9,$08[r12] mov r8,[r8]       > mov r9,r8 add r12,$08 over 1->2       > mov r8,[r12] !@ 3->2 mov r15,$08[r12]       >!@ 3->2 mov rax,r15 @ 2->2       > mov rax,r15 mov r15,[r9] mov r15,[r15]       > mov r15,[r9] mov [r9],rax r> 2->3       > mov [r9],rax swap 2->3 mov r9,$00[r13]       >swap 2->3 add r12,$08 add r13,$08       > add r12,$08 mov r9,r8 ! 3->1       > mov r9,r8 mov r8,[r12] mov [r9],r15       > mov r8,[r12] ! 3->1 swap 1->2       >! 3->1 mov [r9],r15 mov r15,$08[r12]       > mov [r9],r15 ;s 1->1 add r12,$08       >;s 1->1 mov rbx,$00[r13] ! 2->0       > mov rbx,$00[r13] add r13,$08 mov [r15],r8       > add r13,$08 mov rax,[rbx] ;s 0->1       > mov rax,[rbx] jmp eax mov r8,$08[r12]       > jmp eax add r12,$08       > mov rbx,$00[r13]       > add r13,$08       > mov rax,[rbx]       > jmp eax              The difference between exchange and exchange4 shows how stack caching       can have a hard-to-predict effect. Gforth searches for the shortest       path through the available stack-cache states, where shortness is       defined by the native-code length. E.g., it starts with state 1, and       from there it can use any of the dup variants starting in state 1, or       first transition to another state and use a dup variant starting from       there.              For SWAP and ROT gforth-fast has the following variants:              primitive in-out # code bytes       swap 1-1 132 len= 4+ 13+ 3       swap 2-2 37 len= 4+ 9+ 3       swap 3-3 4 len= 4+ 9+ 3       swap 0-2 8 len= 4+ 14+ 3       swap 1-2 82 len= 4+ 9+ 3       swap 2-1 74 len= 4+ 8+ 3       swap 2-3 30 len= 4+ 11+ 3       swap 3-2 3 len= 4+ 11+ 3       swap 2-0 20 len= 4+ 13+ 3       rot 1-1 46 len= 4+ 23+ 3       rot 3-3 6 len= 4+ 12+ 3       rot 3-1 24 len= 4+ 13+ 3       rot 2-3 15 len= 4+ 9+ 3       rot 1-3 17 len= 4+ 17+ 3       rot 0-3 1 len= 4+ 19+ 3              You get these data (in a rawer form) with              gforth-fast --print-prim -e bye |& grep ^swap       gforth-fast --print-prim -e bye |& grep ^rot              The in column is the stack-cache state on entering the word, the out       column is the stack-cache state on leaving the word. The # column       shows how many times this variant of the primitive is used (static       counts). The code length colum shows three parts, the middle of which       is the part that's copied to dynamic superinstructions like the ones       shown above, and this length is what is used in the search for the       shortest path.              In EXCHANGE4, the shortest variant of ROT is used: ROT 2->3; and the       primitives of the other variants are selected to also result in the       shortest overall code.              In EXCHANGE and EXCHANGE4, SWAP 2->3 is not the shortest variant of       SWAP, not even the shortest variant starting from state 2, but ending       in state 3 allows to use cheap variants of further primitives such as       !@ and !, resulting in the overall shortest code for this sequence.       In EXCHANGE2, we see the selection of a shorter version of SWAP, but       one of the costs is that ;s becomes longer (but in this case the       overall savings from using a shorter version of SWAP and shorter       versions of earlier instructions make up for that).              Why am I looking at this? For stack-shuffling primitives like SWAP       and ROT, it's not obvious which variant is how long and which variant       should be selected.              These stack-shiffling words therefore are also good candidates for       performing stack-cache state transitions that would otherwise require       inserting extra transition code:              E.g., EXCHANGE and its variants consume two stack items, but need to       start in stack-cache state 1 and end in stack-cache state 1 (gforth is       currently not smart enough to deal with other states at basic-block       boundaries), so not everything can be done in the stack cache; the       stack pointer needs to be increased by two cells, and there need to be       accesses to the memory part of the stack for two stack items.              In EXCHANGE, the adjustment by one cell and memory access for one       stack item is done in the first SWAP 2->3, and another one in the       second one. In EXCHANGE4, ROT 2->3 and SWAP 2->3 perform these tasks.       In EXCHANGE2, the SWAP 1->3 does it for one cell, and the stack-cache       state transition 0->1 in the first two instructions of ;s do it for       the other cell (gforth-fast actually does not have a built-in variant       S 0->1 and the code shown as ;S 0->1 by SEE-CODE is actually composed       of a transition 0->1 and the ;S 1->1 variant).              I wanted to know how often which variant of these stack-shuffling       primitives is used, and how this relates to their length. One       interesting result is that ROT 1->3 is used relatively frequently       despite having relatively long code. Apparently the code that comes       before these 17 instances of ROT benefits a lot from being in       stack-cache state 1 and this amortizes the longer code for the ROT       1->3 compared to ROT 2->3.              Another interesting result is the low usage of SWAP 3->2 compared to       SWAP 2->3. This may say something about how SWAP is used in Forth       programs. Or it may be an artifact of tie-breaking: If two paths have       the same length, one is chosen rather arbitrarily, but consistently,       and this may make one variant appear more useful than merited by the       benefit that the existence of the variant has on code length.              For those interested, here's the code for the various variants shown       above:              r12: stack pointer        r8: stack cache register a (tos in state 1, 2nd in state 2, 3rd in state 3)       r15: stack-cache register b (tos in state 2, 2nd in state 3)        r9: stack-cache register c (tos in state 3)              swap 1-1       559E7F769425: mov rax,$08[r12]       559E7F76942A: mov $08[r12],r8              [continued in next message]              --- SoupGate-DOS v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca