Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 3,806 of 4,675    |
|    Terje Mathisen to All    |
|    Re: Fast conversion to a boolean of 0 or    |
|    08 Mar 19 08:21:48    |
      From: terje.mathisen@nospicedham.tmsw.no              Rick's testing shows first of all that all the sequences are pretty       similar, it probably depends mostly on any micro-architectural quirks on       the machine you are doing your testing.              Analysing the code statically shows that the JZ code should win clearly       in this scenario, since the inputs are extremely non-random: Only a       single input (the loop counter) is zero, the rest are all non-zero, so       the branch will be predicted as not taken. Invert the code and it should       run faster because now all three instructions can sun in the same cycle       and the final XOR EAX,EAX will be skipped:               test eax,eax        mov eax,1        jnz done        xor eax,eax       done:              The CMOVcc code has been quite slow on many/most x86 cpus that have       supported it, taking a minimum of 2 cycles, similarly NEG has also been       a 2-cycle op on several cpus, and both ADC and SBB have been a cycle       slower than the standard ADD and SUB instructions.              In theory James SBB code and my ADC should have been identical, but my       version probably suffers from the MOV EAX,0 which is a much larger       instructions, so more code bytes?              The SETNZ code can indeed suffer Partial Register stalls on some cpus as       it is currently written, you need either an explicit MOVZX EAX,AL or an       initial zero of the target reg to be safe.              Terje              --       - |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca