home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.forth      Forth programmers eat a lot of Bratwurst      117,927 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 117,853 of 117,927   
   Anton Ertl to peter   
   Re: C compiler optimization and Forth en   
   26 Jan 26 19:24:43   
   
   From: anton@mips.complang.tuwien.ac.at   
      
   peter  writes:   
   >On Sat, 24 Jan 2026 16:47:16 GMT   
   >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:   
   >> I have now also tried it with gcc-14.2, and that produces better code.   
   >> Results from a Xeon E-2388G (Rocket Lake):   
   >>   
   >>  sieve bubble matrix   fib   fft gcc options   
   >>  0.032  0.032  0.015 0.037 0.014 -O2   
   >>  0.035  0.032  0.015 0.037 0.014 -O3 -fno-tree-vectorize (gforth default)   
   >>  0.033  0.034  0.016 0.032 0.014 -O3 (with auto vectorization)   
   >>   
   >> The code for ROT and 2SWAP does not use auto-vectorization, and the   
   >> code for 2! uses auto-vectorization in a way that reduces the   
   >> instruction count:   
   >>   
   >> -O3 (auto-vectorized)     -O3 -fno-tree-vectorize   
   >> add    $0x8,%rbx          add $0x8,%rbx   
   >> movq   0x8(%r13),%xmm0    mov 0x10(%r13),%rax   
   >> add    $0x18,%r13         mov 0x8(%r13),%rdx   
   >> movhps -0x8(%r13),%xmm0   add $0x18,%r13   
   >> movups %xmm0,(%r8)        mov %rdx,(%r8)   
   >> mov    0x0(%r13),%r8      mov %rax,0x8(%r8)   
   >> mov    (%rbx),%rax        mov 0x0(%r13),%r8   
   >> jmp    *%rax              mov (%rbx),%rax   
   >>                           jmp *%rax   
   >>   
   >> And the common tail with all these move instructions is gone.   
   >>   
   >> - anton   
   >   
   >What does your C code looks like? I could not get clang or gcc to auto   
   vectories   
   >with my existing code   
   >   
   >  	UNS64 *tmp64 = (UNS64*)TOP;   
   >        tmp64[0] = sp[0];   
   >        tmp64[1] = sp[1];   
   >        TOP = sp[2];   
   >        sp += 3;   
      
   Gforth's source code for 2! is:   
      
   2!	( w1 w2 a_addr -- )		core	two_store   
   ""Store @i{w2} into the cell at @i{c-addr} and @i{w1} into the next cell.""   
   a_addr[0] = w2;   
   a_addr[1] = w1;   
      
   A generator produces the following from that, which is passed to gcc:   
      
   LABEL(two_store) /* 2! ( w1 w2 a_addr -- ) S1 -- S1  */   
   /* Store @i{w2} into the cell at @i{c-addr} and @i{w1} into the next cell. */   
   NAME("2!")   
   ip += 1;   
   LABEL1(two_store)   
   {   
   DEF_CA   
   MAYBE_UNUSED Cell w1;   
   MAYBE_UNUSED Cell w2;   
   MAYBE_UNUSED Cell * a_addr;   
   NEXT_P0;   
   vm_Cell2w(sp[2],w1);   
   vm_Cell2w(sp[1],w2);   
   vm_Cell2a_(spTOS,a_addr);   
   #ifdef VM_DEBUG   
   if (vm_debug) {   
   fputs(" w1=", vm_out); printarg_w(w1);   
   fputs(" w2=", vm_out); printarg_w(w2);   
   fputs(" a_addr=", vm_out); printarg_a_(a_addr);   
   }   
   #endif   
   sp += 3;   
   {   
   #line 1815 "prim"   
   a_addr[0] = w2;   
   a_addr[1] = w1;   
   #line 10136 "prim-fast.i"   
   }   
      
   #ifdef VM_DEBUG   
   if (vm_debug) {   
   fputs(" -- ", vm_out); fputc('\n', vm_out);   
   }   
   #endif   
   NEXT_P1;   
   spTOS = sp[0];   
   LABEL2(two_store)   
   NAME1("l2-two_store")   
   NEXT_P1_5;   
   LABEL3(two_store)   
   NAME1("l3-two_store")   
   DO_GOTO;   
   }   
      
   There are a lot of macros in this code, and I fear that expanding them   
   makes the code even less readable, but the essence for the   
   auto-vectorized part is something like:   
      
   w1 = sp[2];   
   w2 = sp[1];   
   a_addr = spTOS;   
   sp += 3;   
   a_addr[0] = w2;   
   a_addr[1] = w1;   
   spTOS = sp[0];   
      
   My guess is that in your code the compiler expected that sp[1] might   
   alias with tmp64[0], and therefore did not vectorize the loads and the   
   stores, whereas in the Gforth code, the loads both happen first, and   
   then the two stores, and gcc can vectorize that.  I doubt that there   
   is a big benefit from that, though.   
      
   >typedef UNS64 v2u64 __attribute__((vector_size(16))) __attribut   
   __((aligned(8)));   
      
   I'll have to remember the aligned attribute for future games with gcc   
   explicit vectorization.   
      
   - anton   
   --   
   M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html   
   comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html   
        New standard: https://forth-standard.org/   
   EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca