From: user5857@newsgrouper.org.invalid   
      
   EricP posted:   
      
   > Anton Ertl wrote:   
   > > EricP writes:   
   > >> That shows about 12% instructions are conditional branch and 9% CMP.   
   > >> That says to me that almost all Bcc are paired with a CMP,   
   > >> and very few use the flags set as a side effect of ALU ops.   
   > >>   
   > >> I would expect those two numbers to be closer as even today compilers   
   don't   
   > >> know about those side effect flags and will always emit a CMP or TST   
   first.   
   > >   
   > > Compilers certainly have problems with single flag registers, as they   
   > > run contrary to the base assumption of register allocation. But you   
   > > don't need full-blown tracking of flags in order to make use of flags   
   > > side effects in compilers. Plain peephole optimization can be good   
   > > enough. E.g., if you have   
   > >   
   > > if (a+b<0) ...   
   > >   
   > > the compiler may naively translate this to   
   > >   
   > > add tmp = a, b   
   > > tst tmp   
   > > bge cont   
   > >   
   > > The peephole optimizer can have a rule that says that this is   
   > > equivalent to   
   > >   
   > > add tmp = a, b   
   > > bge cont   
   > >   
   > > When I compile   
   > >   
   > > long foo(long a, long b)   
   > > {   
   > > if (a+b<0)   
   > > return a-b;   
   > > else   
   > > return a*b;   
   > > }   
   > >   
   > > with gcc-12.2.0 -O -c on AMD64, I get   
   > >   
   > > 0000000000000000 :   
   > > 0: 48 89 f8 mov %rdi,%rax   
   > > 3: 48 89 fa mov %rdi,%rdx   
   > > 6: 48 01 f2 add %rsi,%rdx   
   > > 9: 78 05 js 10    
   > > b: 48 0f af c6 imul %rsi,%rax   
   > > f: c3 ret   
   > > 10: 48 29 f0 sub %rsi,%rax   
   > > 13: c3 ret   
   > >   
   > > Look, Ma, no tst.   
   > >   
   > > - anton   
   >   
   > This could be 1 MOV shorter.   
   > It didn't need to MOV %rdi, %rdx as it already copied rdi to rax.   
   > Just ADD %rsi,%rdi and after that use the %rax copy.   
      
   foo:   
    ADD R3,R1,R2   
    PLT0 R3,TF   
    ADD R1,R1,-R2   
    MUL R1,R1,R2   
    RET   
      
   5 inst versus 8.   
      
   > For that optimization { ADD CMP Bcc } => { ADD Bcc }   
   > to work those three instructions must be adjacent.   
   > In this case it wouldn't make a difference but in general   
   > I think they would want the freedom to move code about and not have   
   > the ADD bound to the Bcc too early so this would have to be about   
   > the very last optimization so it didn't interfere with code motion.   
   >   
   > The Microsoft compiler uses LEA to do the add which doesn't change flags   
   > so even if it has a flags optimization it would not detect it:   
   >   
   > long foo(long,long) PROC ; foo, COMDAT   
   > lea eax, DWORD PTR [rcx+rdx]   
   > test eax, eax   
   > jns SHORT $LN2@foo   
   > sub ecx, edx   
   > mov eax, ecx   
   > ret 0   
   > $LN2@foo:   
   > imul ecx, edx   
   > mov eax, ecx   
   > ret 0   
      
   5 versus 9   
      
   > Also if MS had moved ecx to eax first as GCC does then it could have   
   > the function result land in eax and eliminate the final two MOV eax,ecx.   
      
   still 5 versus 7   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|