On Fri, 10 Nov 2017 16:02:27 -0700, "James Van Buskirk"   
    wrote:   
      
   >Are you really trying to directly time something that takes only   
   >12 cock cycles? There are two problems with this:   
   >   
   >1) RDTSC can start before other instructions complete because   
   >it has no dependencies. You can perhaps fix this by using   
   >RDTSCP instead of RDTSC.   
   >   
   Since I use it only at the start and the end of the code, and it runs   
   a million times, the overhead of RDTSC and not completed instructions   
   can be neglected I think.   
      
   >2) RDTSC has granularity of the bus clock, not the processor   
   >clock, so even though it outputs in units of processor clocks,   
   >all result have some common small divisor such as 4 or 10 or   
   >even 12 or 18. Thus if your granularity is 12 a sequence that   
   >takes 13 cycles could only yield a measured output of 12 or   
   >24 cycles.   
   >   
   Do you refer to that part of the   
   Intel 64 and IA-32 Architectures Software Developer’s Manual,   
   System Programming Guide?   
   #####   
   18.11.5 Cycle Counting and Opportunistic Processor Operation   
   ...   
   For processors based on Intel Core microarchitecture, the scalable bus   
   frequency is encoded in the bit field MSR_FSB_FREQ[2:0] at (0CDH), see   
   Chapter 34, “Model Specific Registers (MSRs)”. The maximum resolved   
   bus ratio can be read from the following bit field:   
      
   * If XE operation is disabled, the maximum resolved bus ratio can be   
    read in MSR_PLATFORM_ID[12:8]. It corresponds to the maximum   
    qualified frequency.   
      
   * IF XE operation is enabled, the maximum resolved bus ratio is given   
    in MSR_PERF_STAT[44:40], it corresponds to the maximum XE operation   
    frequency configured by BIOS.   
   ...   
   #####   
   Thanks for the hint, I guess I will have to read up a little more on   
   bus granularity!   
      
   >Make sure that you are taking these considerations into   
   >account when devising a strategy for timing short instruction   
   >sequences.   
   >   
   That's why i run the code 1,000,000 times, which doesn't make it so   
   short after all.   
      
   The timings I made with RDTSC on a Pentium have been very consistent   
   with the tables in Agner Fog's Manual, but then the underlying   
   processor architecture was much simpler. That makes it so very hard   
   on the newer cpus to follow what's going on underneath :-(   
   --   
   aen   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|