From: nowhere@nospicedham.never.at   
      
   On 11.03.2019 09:50, Rod Pemberton wrote:   
   > On Mon, 11 Mar 2019 00:33:50 +0100   
   > wolfgang kern wrote:   
   >   
   >> what I think about benchmarks (one more time):   
   >>   
   >> iterative runs of any piece of code is senseless, results are just   
   >> background noise from OS.   
   >>   
   >> benchmarks from one tool will never match any other, so all these   
   >> reported values look suspicious random to me.   
   >>   
   >> And where is the practical use of such test loops ?   
   >> most code under test may run frequently but rare as a loop,   
   >>   
   >>   
   >> my solution produces reproducible (max 1 cycle deviation) results   
   >> even if just testing a single instruction. I check only a few times   
   >> but I use my own OS with my own debugger (it's short and got no PL   
   >> issues)**:   
   >>   
   >> 1. WBINV   
   >> JMP near test (forward)   
   >>   
   >> test:   
   >> ALIGN for code under test starts aligned (cache bounds)   
   >> CLI   
   >> RDTSC and store eax:edx in ecx:ebx (this shows constant timing)   
   >>   
   >> code under test is here (but keep ecx:ebx or two others alive)   
   >>   
   >> RDTSC sub eax:edx, ecx:ebx (time result is in eax:edx yet)   
   >> STI   
   >> RET (to my debugger, code under test remain cached)   
   >> read the result   
   >>   
   >> 2. start the test again without WBINV and JMP.   
   >> read the result   
   >>   
   >> 3. play with alignment and run step 1 & 2 again to compare with   
   >> previous reads (only a few times anyway).   
   >>   
   >> ** I always test within PL0, so my results may differ from those run   
   >> at lesser PL.   
   >>   
   >> this works well for short code parts that fit a cache line, for   
   >> larger function blocks I call the test-code and store temporary TSC   
   >> in memory.   
      
      
   > You need to use the RDTSCP instruction for multiple core processors,   
   > instead of the RDTSC instruction.   
      
   Yes, but only if more than one core is active.   
   I mainly have just the boot-core active and make all others sleep.   
      
   > AMD K8 and dual-core platforms suffer from TSC drift. The TSC is also   
   > not frequency independent on those processors.   
      
   this was already solved on later AMD CPUs (CPUID: TSC_invariant bit).   
      
   > Efficeon processors update TSC at maximum frequency and doesn't   
   > properly update the actual clock speed.   
      
   what I use since 12 years seem to have reliable stable clock counters.   
   __   
   wolfgang   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|