... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.forth
Forth programmers eat a lot of Bratwurst
117,927 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 116,626 of 117,927
Krishna Myneni to Krishna Myneni
Re: Implementing DOES>: How not to do it
14 Jul 24 13:32:19
   From: krishna.myneni@ccreweb.org   
      
   On 7/14/24 07:20, Krishna Myneni wrote:   
   > On 7/14/24 04:02, albert@spenarnc.xs4all.nl wrote:   
   >> In article <2024Jul13.173138@mips.complang.tuwien.ac.at>,   
   >> Anton Ertl  wrote:   
   >>    
   >>>   
   >>> In any case, if you are a system implementor, you may want to check   
   >>> your DOES> implementation with a microbenchmark that stores into the   
   >>> does-defined word in a case where that word is not inlined.   
   >>   
   >> Is that equally valid for indirect threaded code?   
   >> In indirect threaded code the instruction and data cache   
   >> are more separated, e.g. in a simple Forth all the low level   
   >> code could fit in the I-cache, if I'm not mistaken.   
   >>   
   >   
   >   
   > Let's check. In kForth-64, an indirect threaded code system,   
   >   
   > .s   
   >    
   >   ok   
   > f.s   
   > fs:    
   >   ok   
   > ms@ b4 ms@ swap - .   
   > 4274  ok   
   > ms@ b5 ms@ swap - .   
   > 3648  ok   
   >   
   > So b5 appears to be more efficient that b4 ( the version with DOES> ).   
   >   
   > --   
   > Krishna   
   >   
   > === begin code ===   
   > 50000000 constant iterations   
   >   
   > : faccum  create 1 floats allot? 0.0e f!   
   >      does> dup f@ f+ fdup f! ;   
   >   
   > : faccum-part2 ( F: r1 -- r2 ) ( a -- )   
   >      dup f@ f+ fdup f! ;   
   >   
   > faccum x4  2.0e x4 fdrop   
   > faccum y4 -4.0e y4 fdrop   
   >   
   > : b4 0.0e iterations 0 do x4 y4 loop ;   
   > : b5 0.0e iterations 0 do   
   >      [ ' x4 >body ] literal faccum-part2   
   >      [ ' y4 >body ] literal faccum-part2   
   >    loop ;   
   > === end code ===   
   >   
   >   
   >   
   >   
      
   Using perf to obtain the microbenchmarks for B4 and B5,   
      
   B4   
      
   $ LC_NUMERIC=prog perf stat -e cycles:u -e instructions:u -e   
   L1-dcache-load-misses -e L1-icache-load-misses -e branch-misses kforth64   
   -e "include does-microbench.4th b4 f. cr bye"   
   -inf   
   Goodbye.   
      
     Performance counter stats for 'kforth64 -e include does-microbench.4th   
   b4 f. cr bye':   
      
           14_381_951_937      cycles:u   
      
           26_206_810_946      instructions:u     #    1.82  insn per cycle   
      
                 58_563        L1-dcache-load-misses:u   
      
                 14_742        L1-icache-load-misses:u   
      
             100_122_231       branch-misses:u   
      
      
           4.501011307 seconds time elapsed   
      
           4.477172000 seconds user   
           0.003967000 seconds sys   
      
      
   B5   
      
   $ LC_NUMERIC=prog perf stat -e cycles:u -e instructions:u -e   
   L1-dcache-load-misses -e L1-icache-load-misses -e branch-misses kforth64   
   -e "include does-microbench.4th b5 f. cr bye"   
   -inf   
   Goodbye.   
      
     Performance counter stats for 'kforth64 -e include does-microbench.4th   
   b5 f. cr bye':   
      
           11_529_644_734      cycles:u   
      
           18_906_809_683      instructions:u      #    1.64  insn per   
   cycle   
                 59_605        L1-dcache-load-misses:u   
      
                 21_531        L1-icache-load-misses:u   
      
             100_109_360       branch-misses:u   
      
      
           3.616353010 seconds time elapsed   
      
           3.600206000 seconds user   
           0.004639000 seconds sys   
      
      
   It appears that the cache misses are fairly small for both b4 and b5,   
   but the branch misses are very high in my system.   
      
   --   
   Krishna   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]