... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"

comp.lang.asm.x86

Ahh, the lost art of x86 assembly

4,675 messages

[ << oldest | < older | list | newer > | newest >> ]

Message 3,818 of 4,675

Rod Pemberton to wolfgang kern

Re: benchmarks..

11 Mar 19 04:50:48

   From: invalid@nospicedham.lkntrgzxc.com   
      
   On Mon, 11 Mar 2019 00:33:50 +0100   
   wolfgang kern  wrote:   
      
   > what I think about benchmarks (one more time):   
   >   
   > iterative runs of any  piece of code is senseless, results are just   
   > background noise from OS.   
   >   
   > benchmarks from one tool will never match any other, so all these   
   > reported values look suspicious random to me.   
   >   
   > And where is the practical use of such test loops ?   
   > most code under test may run frequently but rare as a loop,   
   >   
   >   
   > my solution produces reproducible (max 1 cycle deviation) results   
   > even if just testing a single instruction. I check only a few times   
   > but I use my own OS with my own debugger (it's short and got no PL   
   > issues)**:   
   >   
   > 1. WBINV   
   >     JMP near test (forward)   
   >   
   > test:   
   >     ALIGN for code under test starts aligned (cache bounds)   
   >     CLI   
   >     RDTSC and store eax:edx in ecx:ebx (this shows constant timing)   
   >   
   >     code under test is here (but keep ecx:ebx or two others alive)   
   >   
   >     RDTSC sub eax:edx, ecx:ebx  (time result is in eax:edx yet)   
   >     STI   
   >     RET (to my debugger, code under test remain cached)   
   >     read the result   
   >   
   > 2. start the test again without WBINV and JMP.   
   >     read the result   
   >   
   > 3. play with alignment and run step 1 & 2 again to compare with   
   >     previous reads (only a few times anyway).   
   >   
   > ** I always test within PL0, so my results may differ from those run   
   > at lesser PL.   
   >   
   > this works well for short code parts that fit a cache line, for   
   > larger function blocks I call the test-code and store temporary TSC   
   > in memory. __   
      
   You need to use the RDTSCP instruction for multiple core processors,   
   instead of the RDTSC instruction.   
      
   AMD K8 and dual-core platforms suffer from TSC drift.  The TSC is also   
   not frequency independent on those processors.   
      
   Efficeon processors update TSC at maximum frequency and doesn't   
   properly update the actual clock speed.   
      
      
   Rod Pemberton   
   --   
   Apple opposes "glorifying violence" and "dehumanizing language".  Yet,   
   it manufactures products in China which commits crimes against humanity.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)

[ << oldest | < older | list | newer > | newest >> ]