Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.lang.asm.x86    |    Ahh, the lost art of x86 assembly    |    4,675 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 3,812 of 4,675    |
|    Bart to Bart    |
|    Re: Fast conversion to a boolean of 0 or    |
|    09 Mar 19 12:21:50    |
      From: bc@nospicedham.freeuk.com              On 08/03/2019 23:47, bart4858@nospicedham.gmail.com wrote:> On Friday, 8       March 2019 12:52:40 UTC, Bart wrote:        >        >> I wasn't able to compile your program using CL.EXE (didn't seem to        >> recognise _asm).        >>        >> What I wanted to test was how long an empty for-loop would take. (I        >> assume you would compile without optimisations to make sure it doesn't        >> just get rid of such a loop.)        >>        >> Since I was curious as to how much of those 2 seconds is overheads.        >        > I instead just typed in the asm examples using another language. It       didn't take long.        >        > Tested on a cheap laptop (no idea what processor), loop overheads       seemed to be 7/8 of total. Results I got were:        >        >        > loop : 1688 ms        > cmov : 234        > sbb : 218        > jz : 140        > terje adc : 328        > james sbb+neg : 218        > setnz : 219        >        > The loop time is for an empty loop....              Rick said (it might have been in an email), that an empty loop sometimes       takes longer. When I tried it today on my main PC, that is what I found.              (Not entirely surprisingly; I find that all the time with x86 and x64:       you make code simpler and shorter by removing instructions, and it       becomes slower!)              But still persevering with it, I duplicated the loop body 10 times for       each test, and got these results (loop count is different from above):              loop : 248 ms       cmov : 305 So 248+305 or 553 ms total       sbb : 453       jz : 355       terje adc : 329       james sbb+neg : 441       setnz : 426              This is with an AMD Athlon II X2 2.7GHz (maybe that means something to       the people here).              Running the same program on the laptop:              loop : 343 ms       cmov : 1157 1157+343 total       sbb : 1298       jz : 736       terje adc : 1188       james sbb+neg : 1235       setnz : 1220              That's on Intel Celeron N3050 1.6GHz.              So CMOV was fastest on one, but JZ on the other.              My conclusion: such micro-timings are pointless...              More than likely (speaking as a complete non-expert on the inner       workings of x86/64), the best choice depends on surrounding instructions       and the current state.              --- SoupGate-Win32 v1.05        * Origin: you cannot sedate... all the things you hate (1:229/2)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca