... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.lang.asm.x86

Ahh, the lost art of x86 assembly

4,675 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 3,806 of 4,675

Terje Mathisen to All

Re: Fast conversion to a boolean of 0 or

08 Mar 19 08:21:48

   From: terje.mathisen@nospicedham.tmsw.no   
      
   Rick's testing shows first of all that all the sequences are pretty   
   similar, it probably depends mostly on any micro-architectural quirks on   
   the machine you are doing your testing.   
      
   Analysing the code statically shows that the JZ code should win clearly   
   in this scenario, since the inputs are extremely non-random: Only a   
   single input (the loop counter) is zero, the rest are all non-zero, so   
   the branch will be predicted as not taken. Invert the code and it should   
   run faster because now all three instructions can sun in the same cycle   
   and the final XOR EAX,EAX will be skipped:   
      
      test eax,eax   
      mov eax,1   
       jnz done   
      xor eax,eax   
   done:   
      
   The CMOVcc code has been quite slow on many/most x86 cpus that have   
   supported it, taking a minimum of 2 cycles, similarly NEG has also been   
   a 2-cycle op on several cpus, and both ADC and SBB have been a cycle   
   slower than the standard ADD and SUB instructions.   
      
   In theory James SBB code and my ADC should have been identical, but my   
   version probably suffers from the MOV EAX,0 which is a much larger   
   instructions, so more code bytes?   
      
   The SETNZ code can indeed suffer Partial Register stalls on some cpus as   
   it is currently written, you need either an explicit MOVZX EAX,AL or an   
   initial zero of the target reg to be safe.   
      
   Terje   
      
   --   
   -    
   "almost all programming can be viewed as an exercise in caching"   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]