... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.forth
Forth programmers eat a lot of Bratwurst
117,927 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 117,849 of 117,927
Hans Bezemer to All
Re: EuroForth 2025 preliminary proceedin
22 Jan 26 16:51:13
   From: the.beez.speaks@gmail.com   
      
   On 16-01-2026 18:38, Anton Ertl wrote:   
      
   On 17-01-2026 16:58, Hans Bezemer wrote:   
      
   I've done my thing, compiled 4tH with optimizations -O3 till -O0.   
   I thought, let's make this simple and execute ALL benchmarks I got. Some   
   of them have become useless, though for the simple reason hardware has   
   become that much better.   
      
   But still, here it is. Overall, the performance consistently   
   deteriorates, aka -O3 gives the best performance. There are a few minor   
   glitches, some due to random benchmark data.   
      
   For those curious, this is a European CSV with all the data. BTW, you   
   can find all benchmarks here:   
   https://sourceforge.net/p/forth-4th/code/HEAD/tree/trunk/4th.src/bench/   
      
   Hans Bezemer   
      
   ---8<---   
   Benchmark;-O3;-O2;-O1;-O0   
   bench.4th;6.79;6.36;6.68;6.33   
   benchm.4th;1.21;1.66;1.86;2.8   
   benchxls.4th;0.06;0.08;0.08;0.12   
   bubble.4th;0.69;0.95;0.96;1.72   
   bytesiev.4th;0.01;0.01;0.01;0.02   
   countbit.4th;3.52;4.76;5.02;8.01   
   cowell.4th;15.15;20.2;18.91;31.29   
   fib.4th;0.79;1.02;1.02;1.72   
   isortest.4th;0.23;0.33;0.31;0.56   
   matrix.4th;0.22;0.31;0.3;0.51   
   misty.4th;0.58;0.84;1.01;1.59   
   pforth.4th;10.47;13.55;14.42;22.68   
   prims.4th;5.96;8;8.59;14.28   
   simple.4th;0.5;0.7;0.82;1.21   
   sortest.4th;140.96;163.68;150.17;270.87   
   thread.4th;0.35;0.41;0.49;0.7   
   ---8<---   
      
   > Hans Bezemer  writes:   
   >> On 15-01-2026 13:04, Anton Ertl wrote:   
   >>   
   >> A few observations concerning the IMHO most interesting paper,   
   >> "Code-Copying Compilation in Production":   
   > ...   
   >> 3. Commercial compilers (partly) using conventional compilers (see TF,   
   >> fig. 4.7) - that was new to me;   
   >   
   > All Forth compilers I know work at the text interpretation level as   
   > the "Forth compiler" of Thinking Forth, Figure 4.7.   
   >   
   >> 4. GCC -O1 outperforming GCC -O3 on some benchmarks. That's new to me   
   >> too. I might experiment with that one;   
   >   
   > I have analyzed it for bubblesort.  There the problem is that gcc -O3   
   > auto-vectorizes the pair of loads and the pair of stores (when the two   
   > elements are swapped).  As a result, if a pair is stored in one   
   > iteration, the next iteration loads a pair that overlaps the   
   > previously stored pair.  This means that the hardware cannot use its   
   > fast path in store-to-load forwarding, and leads to a huge slowdown.   
   > For a benchmark that has been around for over 40 years.   
   >   
   > In addition, the code generated by gcc -O3 also executes several   
   > additonal instructions per iteration, so I doubt that it would be   
   > faster even if the store-to-load forwarding problem did not exist.   
   >   
   > For fib, I have also looked at the generated code, but have not   
   > understood it well enough to see why the code generated by gcc -O3 is   
   > slower.   
   >   
   > - anton   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]