home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,812 of 4,675   
   Bart to Bart   
   Re: Fast conversion to a boolean of 0 or   
   09 Mar 19 12:21:50   
   
   From: bc@nospicedham.freeuk.com   
      
   On 08/03/2019 23:47, bart4858@nospicedham.gmail.com wrote:> On Friday, 8   
   March 2019 12:52:40 UTC, Bart  wrote:   
    >   
    >> I wasn't able to compile your program using CL.EXE (didn't seem to   
    >> recognise _asm).   
    >>   
    >> What I wanted to test was how long an empty for-loop would take. (I   
    >> assume you would compile without optimisations to make sure it doesn't   
    >> just get rid of such a loop.)   
    >>   
    >> Since I was curious as to how much of those 2 seconds is overheads.   
    >   
    > I instead just typed in the asm examples using another language. It   
   didn't take long.   
    >   
    > Tested on a cheap laptop (no idea what processor), loop overheads   
   seemed to be 7/8 of total. Results I got were:   
    >   
    >   
    > loop           : 1688  ms   
    > cmov           : 234   
    > sbb            : 218   
    > jz             : 140   
    > terje adc      : 328   
    > james sbb+neg  : 218   
    > setnz          : 219   
    >   
    > The loop time is for an empty loop....   
      
   Rick said (it might have been in an email), that an empty loop sometimes   
   takes longer. When I tried it today on my main PC, that is what I found.   
      
   (Not entirely surprisingly; I find that all the time with x86 and x64:   
   you make code simpler and shorter by removing instructions, and it   
   becomes slower!)   
      
   But still persevering with it, I duplicated the loop body 10 times for   
   each test, and got these results (loop count is different from above):   
      
   loop           : 248 ms   
   cmov           : 305     So 248+305 or 553 ms total   
   sbb            : 453   
   jz             : 355   
   terje adc      : 329   
   james sbb+neg  : 441   
   setnz          : 426   
      
   This is with an AMD Athlon II X2 2.7GHz (maybe that means something to   
   the people here).   
      
   Running the same program on the laptop:   
      
   loop           :  343 ms   
   cmov           : 1157     1157+343 total   
   sbb            : 1298   
   jz             :  736   
   terje adc      : 1188   
   james sbb+neg  : 1235   
   setnz          : 1220   
      
   That's on Intel Celeron N3050 1.6GHz.   
      
   So CMOV was fastest on one, but JZ on the other.   
      
   My conclusion: such micro-timings are pointless...   
      
   More than likely (speaking as a complete non-expert on the inner   
   workings of x86/64), the best choice depends on surrounding instructions   
   and the current state.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca