... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.arch
Apparently more than just beeps & boops
131,241 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 129,592 of 131,241
MitchAlsup to All
Re: Compilers and flags
05 Sep 25 15:51:13
   From: user5857@newsgrouper.org.invalid   
      
   EricP  posted:   
      
   > Anton Ertl wrote:   
   > > EricP  writes:   
   > >> That shows about 12% instructions are conditional branch and 9% CMP.   
   > >> That says to me that almost all Bcc are paired with a CMP,   
   > >> and very few use the flags set as a side effect of ALU ops.   
   > >>   
   > >> I would expect those two numbers to be closer as even today compilers   
   don't   
   > >> know about those side effect flags and will always emit a CMP or TST   
   first.   
   > >   
   > > Compilers certainly have problems with single flag registers, as they   
   > > run contrary to the base assumption of register allocation.  But you   
   > > don't need full-blown tracking of flags in order to make use of flags   
   > > side effects in compilers.  Plain peephole optimization can be good   
   > > enough.  E.g., if you have   
   > >   
   > > if (a+b<0) ...   
   > >   
   > > the compiler may naively translate this to   
   > >   
   > > add tmp = a, b   
   > > tst tmp   
   > > bge cont   
   > >   
   > > The peephole optimizer can have a rule that says that this is   
   > > equivalent to   
   > >   
   > > add tmp = a, b   
   > > bge cont   
   > >   
   > > When I compile   
   > >   
   > > long foo(long a, long b)   
   > > {   
   > >   if (a+b<0)   
   > >     return a-b;   
   > >   else   
   > >     return a*b;   
   > > }   
   > >   
   > > with gcc-12.2.0 -O -c on AMD64, I get   
   > >   
   > > 0000000000000000 :   
   > >    0:   48 89 f8                mov    %rdi,%rax   
   > >    3:   48 89 fa                mov    %rdi,%rdx   
   > >    6:   48 01 f2                add    %rsi,%rdx   
   > >    9:   78 05                   js     10    
   > >    b:   48 0f af c6             imul   %rsi,%rax   
   > >    f:   c3                      ret   
   > >   10:   48 29 f0                sub    %rsi,%rax   
   > >   13:   c3                      ret   
   > >   
   > > Look, Ma, no tst.   
   > >   
   > > - anton   
   >   
   > This could be 1 MOV shorter.   
   > It didn't need to MOV %rdi, %rdx as it already copied rdi to rax.   
   > Just ADD %rsi,%rdi and after that use the %rax copy.   
      
   foo:   
          ADD    R3,R1,R2   
          PLT0   R3,TF   
          ADD    R1,R1,-R2   
          MUL    R1,R1,R2   
          RET   
      
   5 inst versus 8.   
      
   > For that optimization { ADD CMP Bcc } => { ADD Bcc }   
   > to work those three instructions must be adjacent.   
   > In this case it wouldn't make a difference but in general   
   > I think they would want the freedom to move code about and not have   
   > the ADD bound to the Bcc too early so this would have to be about   
   > the very last optimization so it didn't interfere with code motion.   
   >   
   > The Microsoft compiler uses LEA to do the add which doesn't change flags   
   > so even if it has a flags optimization it would not detect it:   
   >   
   > long foo(long,long) PROC                                  ; foo, COMDAT   
   >          lea     eax, DWORD PTR [rcx+rdx]   
   >          test    eax, eax   
   >          jns     SHORT $LN2@foo   
   >          sub     ecx, edx   
   >          mov     eax, ecx   
   >          ret     0   
   > $LN2@foo:   
   >          imul    ecx, edx   
   >          mov     eax, ecx   
   >          ret     0   
      
   5 versus 9   
      
   > Also if MS had moved ecx to eax first as GCC does then it could have   
   > the function result land in eax and eliminate the final two MOV eax,ecx.   
      
   still 5 versus 7   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]