home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.lang.asm.x86      Ahh, the lost art of x86 assembly      4,675 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 3,815 of 4,675   
   James Harris to Anton Ertl   
   Re: Fast conversion to a boolean of 0 or   
   10 Mar 19 16:28:37   
   
   From: james.harris.1@nospicedham.gmail.com   
      
   On 10/03/2019 15:38, Anton Ertl wrote:   
   > James Harris  writes:   
   >> Loops may well be OK for large chunks of code but IME things can 'go   
   >> wrong' with timing short sequences of just a few instructions. Not sure   
   >> why but some possible candidates: alignment in the code cache (with or   
   >> without trace-cache effects), residual effects of prior and subsequent   
   >> code, loop overheads, the effect of extra jumps, and interrupts being   
   >> run in the background.   
   >   
   > Yes, I recently had a case where I removed some unused code, and the   
   > program slowed down by IIRC 5% on a Skylake (presumably from different   
   > code alignment).   
      
   I've seen similar. That's a good example of why it can be misleading   
   simply to run different pieces of code in a loop. Similar to your   
   example, a shorter piece of code could appear to be 5% slower when it's   
   really the loop overhead which is making the test slower, rather than   
   the piece of code under test.   
      
   >   
   >> For short sequences of code I found it best to run them either just once   
   >   
   > I expect the same code alignment problems.   
   >   
   >> or, preferably, to run a few as an unrolled loop.   
   >   
   > That should help.   
   >   
   > One very important thing is whether the computations one benchmarks   
   > are independent of each other (then you measure throughput), or   
   > dependent on each other (then you measure latency).  And of course,   
   > for code sequences involving branches, the predictability of the   
   > branches is an important issue.   
      
   Yes, rather than running   
      
      start timer   
        loop   
          code under test   
        endloop   
      stop timer and store results   
      
   it may (IMO) be better to run   
      
      loop   
        start timer   
          code under test   
        stop timer and store results   
      endloop   
      
      
   --   
   James Harris   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca