From: ThatWouldBeTelling@thevillage.com   
      
   Anton Ertl wrote:   
   > EricP writes:   
   >> That shows about 12% instructions are conditional branch and 9% CMP.   
   >> That says to me that almost all Bcc are paired with a CMP,   
   >> and very few use the flags set as a side effect of ALU ops.   
   >>   
   >> I would expect those two numbers to be closer as even today compilers don't   
   >> know about those side effect flags and will always emit a CMP or TST first.   
   >   
   > Compilers certainly have problems with single flag registers, as they   
   > run contrary to the base assumption of register allocation. But you   
   > don't need full-blown tracking of flags in order to make use of flags   
   > side effects in compilers. Plain peephole optimization can be good   
   > enough. E.g., if you have   
   >   
   > if (a+b<0) ...   
   >   
   > the compiler may naively translate this to   
   >   
   > add tmp = a, b   
   > tst tmp   
   > bge cont   
   >   
   > The peephole optimizer can have a rule that says that this is   
   > equivalent to   
   >   
   > add tmp = a, b   
   > bge cont   
   >   
   > When I compile   
   >   
   > long foo(long a, long b)   
   > {   
   > if (a+b<0)   
   > return a-b;   
   > else   
   > return a*b;   
   > }   
   >   
   > with gcc-12.2.0 -O -c on AMD64, I get   
   >   
   > 0000000000000000 :   
   > 0: 48 89 f8 mov %rdi,%rax   
   > 3: 48 89 fa mov %rdi,%rdx   
   > 6: 48 01 f2 add %rsi,%rdx   
   > 9: 78 05 js 10    
   > b: 48 0f af c6 imul %rsi,%rax   
   > f: c3 ret   
   > 10: 48 29 f0 sub %rsi,%rax   
   > 13: c3 ret   
   >   
   > Look, Ma, no tst.   
   >   
   > - anton   
      
   This could be 1 MOV shorter.   
   It didn't need to MOV %rdi, %rdx as it already copied rdi to rax.   
   Just ADD %rsi,%rdi and after that use the %rax copy.   
      
   For that optimization { ADD CMP Bcc } => { ADD Bcc }   
   to work those three instructions must be adjacent.   
   In this case it wouldn't make a difference but in general   
   I think they would want the freedom to move code about and not have   
   the ADD bound to the Bcc too early so this would have to be about   
   the very last optimization so it didn't interfere with code motion.   
      
   The Microsoft compiler uses LEA to do the add which doesn't change flags   
   so even if it has a flags optimization it would not detect it:   
      
   long foo(long,long) PROC ; foo, COMDAT   
    lea eax, DWORD PTR [rcx+rdx]   
    test eax, eax   
    jns SHORT $LN2@foo   
    sub ecx, edx   
    mov eax, ecx   
    ret 0   
   $LN2@foo:   
    imul ecx, edx   
    mov eax, ecx   
    ret 0   
      
   Also if MS had moved ecx to eax first as GCC does then it could have   
   the function result land in eax and eliminate the final two MOV eax,ecx.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|