... darkrealms ...

Forums before death by AOL, social media and spammers... "We can't have nice things"
comp.lang.asm.x86
Ahh, the lost art of x86 assembly
4,675 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 3,821 of 4,675
wolfgang kern to Rod Pemberton
Re: benchmarks..
11 Mar 19 10:26:37
   From: nowhere@nospicedham.never.at   
      
   On 11.03.2019 09:50, Rod Pemberton wrote:   
   > On Mon, 11 Mar 2019 00:33:50 +0100   
   > wolfgang kern  wrote:   
   >   
   >> what I think about benchmarks (one more time):   
   >>   
   >> iterative runs of any  piece of code is senseless, results are just   
   >> background noise from OS.   
   >>   
   >> benchmarks from one tool will never match any other, so all these   
   >> reported values look suspicious random to me.   
   >>   
   >> And where is the practical use of such test loops ?   
   >> most code under test may run frequently but rare as a loop,   
   >>   
   >>   
   >> my solution produces reproducible (max 1 cycle deviation) results   
   >> even if just testing a single instruction. I check only a few times   
   >> but I use my own OS with my own debugger (it's short and got no PL   
   >> issues)**:   
   >>   
   >> 1. WBINV   
   >>      JMP near test (forward)   
   >>   
   >> test:   
   >>      ALIGN for code under test starts aligned (cache bounds)   
   >>      CLI   
   >>      RDTSC and store eax:edx in ecx:ebx (this shows constant timing)   
   >>   
   >>      code under test is here (but keep ecx:ebx or two others alive)   
   >>   
   >>      RDTSC sub eax:edx, ecx:ebx  (time result is in eax:edx yet)   
   >>      STI   
   >>      RET (to my debugger, code under test remain cached)   
   >>      read the result   
   >>   
   >> 2. start the test again without WBINV and JMP.   
   >>      read the result   
   >>   
   >> 3. play with alignment and run step 1 & 2 again to compare with   
   >>      previous reads (only a few times anyway).   
   >>   
   >> ** I always test within PL0, so my results may differ from those run   
   >> at lesser PL.   
   >>   
   >> this works well for short code parts that fit a cache line, for   
   >> larger function blocks I call the test-code and store temporary TSC   
   >> in memory.   
      
      
   > You need to use the RDTSCP instruction for multiple core processors,   
   > instead of the RDTSC instruction.   
      
   Yes, but only if more than one core is active.   
   I mainly have just the boot-core active and make all others sleep.   
      
   > AMD K8 and dual-core platforms suffer from TSC drift.  The TSC is also   
   > not frequency independent on those processors.   
      
   this was already solved on later AMD CPUs (CPUID: TSC_invariant bit).   
      
   > Efficeon processors update TSC at maximum frequency and doesn't   
   > properly update the actual clock speed.   
      
   what I use since 12 years seem to have reliable stable clock counters.   
   __   
   wolfgang   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]