... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 129,593 of 131,241
Anton Ertl to EricP
Re: Compilers and flags
05 Sep 25 16:13:47
   From: anton@mips.complang.tuwien.ac.at   
      
   EricP  writes:   
   >Anton Ertl wrote:   
   >> When I compile   
   >>   
   >> long foo(long a, long b)   
   >> {   
   >>   if (a+b<0)   
   >>     return a-b;   
   >>   else   
   >>     return a*b;   
   >> }   
   >>   
   >> with gcc-12.2.0 -O -c on AMD64, I get   
   >>   
   >> 0000000000000000 :   
   >>    0:   48 89 f8                mov    %rdi,%rax   
   >>    3:   48 89 fa                mov    %rdi,%rdx   
   >>    6:   48 01 f2                add    %rsi,%rdx   
   >>    9:   78 05                   js     10    
   >>    b:   48 0f af c6             imul   %rsi,%rax   
   >>    f:   c3                      ret   
   >>   10:   48 29 f0                sub    %rsi,%rax   
   >>   13:   c3                      ret   
   ...   
   >This could be 1 MOV shorter.   
   >It didn't need to MOV %rdi, %rdx as it already copied rdi to rax.   
   >Just ADD %rsi,%rdi and after that use the %rax copy.   
      
   Yes, I often see more register-register moves in gcc-generated code   
   than necessary.   
      
   >For that optimization { ADD CMP Bcc } => { ADD Bcc }   
   >to work those three instructions must be adjacent.   
   >In this case it wouldn't make a difference but in general   
   >I think they would want the freedom to move code about and not have   
   >the ADD bound to the Bcc too early so this would have to be about   
   >the very last optimization so it didn't interfere with code motion.   
      
   Yes, possible.  When I look at what clang-14.0.6 -O -c produces, it's   
   this:   
      
   0000000000000000 :   
      0:   48 89 f9                mov    %rdi,%rcx   
      3:   48 29 f1                sub    %rsi,%rcx   
      6:   48 89 f0                mov    %rsi,%rax   
      9:   48 0f af c7             imul   %rdi,%rax   
      d:   48 01 fe                add    %rdi,%rsi   
     10:   48 0f 48 c1             cmovs  %rcx,%rax   
     14:   c3                      ret   
      
   clang seems to prefer using cmov.  The interesting thing here is that   
   it puts the add right in front of the cmovs, after the code for "a-b"   
   and "a*b".  When I do   
      
   long foo(long a, long b)   
   {   
     if (a+b*111<0)   
       return a-b;   
     else   
       return a*b;   
   }   
      
   clang produces this code:   
      
   0000000000000000 :   
      0:   48 6b ce 6f             imul   $0x6f,%rsi,%rcx   
      4:   48 89 f8                mov    %rdi,%rax   
      7:   48 29 f0                sub    %rsi,%rax   
      a:   48 0f af f7             imul   %rdi,%rsi   
      e:   48 01 f9                add    %rdi,%rcx   
     11:   48 0f 49 c6             cmovns %rsi,%rax   
     15:   c3                      ret   
      
   I.e., rcx=b*111 is first, but a+rcx is late, right before the cmovns.   
   So it seems to have some mechanism for keeping the add and the   
   cmov(n)s as one unit.   
      
   - anton   
   --   
   'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'   
     Mitch Alsup,    
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]