Skip to main content

High Resolution Clocks and Timers for Performance Measurement in Windows.

This content has been moved to

Sorry for the inconvenience.

-- The Management


  1. You say that Win7 will use RDTSC if it is invariant, and e.g. HPET or APIC if it is not. However, I have noticed that Win7 QPC ticks neither at processor frequency, or HPET or APIC. It seems to vary by machine, so it's probably not HPET or APIC, as those tend to be constant across machines. So what, exactly, does Win7 do to derive QPC? I read a hypothesis that they discard the lower e.g. 10 bits of RDTSC and use that for QPC, but can you confirm?

    1. Not sure exactly what you are asking me here to confirm.
      If you are accessing the QPC() api directly, you need to also call QFC() to get the clock frequency. On current hardware, QPC on Win 7 should execute a simple rdtsc instruction. If your program is calling something like the .NET StopWatch method, which is a thin wrapper around those APIs, the result will be rounded up to 100 nanosecond timer ticks, and you will lose some of the least significant digits.

      On Win8 and ARM, I am not really sure what is going on. I suspect there would have to be a HAL interface to make the API hardware independent.

  2. To say that this is an extremely informative post is an understatement. Thank you for taking the time to write this, it really answered a lot of questions I had in regards to QPC()/rdtsc on modern (and not-so-modern) Windows platforms.


Post a Comment

Popular posts from this blog

“There’s a lot more to running a starship than answering a lot of fool questions.”

Continuing a series of blog posts on “expert” computer Performance rules, I am reminded of something Captain James T. Kirk, commander of the starship Enterprise, once said in an old Star Trek episode: “There’s a lot more to running a starship than answering a lot of fool questions.” Star Trek, The Original Series. Episode: The Deadly Years. Season 2, Episode 12. See For some reason, the idea that the rote application of some set of rules derived by a domain “expert” can suffice in computer performance analysis has great sway. At the risk of beating a dead horse, I want to highlight another example of a performance Rule you are likely to face, and, in the process, discuss why there is a whole lot more to applying it than might be obvious at first glance. There happens to be a lot more to computer performance analysis than the rote evaluation of some set of well-formed performance rules. It ought to be apparent by now that I …

How Windows performance counters are affected by running under VMware ESX

This post is a prequel to a recent one on correcting the Process(*)\% Processor Time counters on a Windows guest machine.

To assess the overall impact of the VMware virtualization environment on the accuracy of the performance measurements available for Windows guest machines, it is necessary to first understand how VMware affects the clocks and timers that are available on the guest machine. Basically, VMware virtualizes all calls made from the guest OS to hardware-based clock and timer services on the VMware Host. A VMware white paper entitled “Timekeeping in VMware Virtual Machines” contains an extended discussion of the clock and timer distortions that occur in Windows guest machines when there are virtual machine scheduling delays. These clock and timer services distortions, in turn, cause distortion among a considerably large set of Windows performance counters, depending on the specific type of performance counter. (The different types of performance counters are described here

Virtual memory management in VMware: memory ballooning

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.

Ballooning is a complicated topic, so bear with me if this post is much longer than the previous ones in this series.

As described earlier, VMware installs a balloon driver inside the guest OS and signals the driver to begin to “inflate” when it begins to encounter contention for machine memory, defined as the amount of free machine memory available for new guest machine allocation requests dropping below 6%. In the benchmark example I am discussing here, the Memory Usage counter rose to 98% allocation levels and remained there for duration of the test while all four virtual guest machines were active.

Figure 7, which shows the guest machine Memory Granted counter for each guest, with an overlay showing the value of the Memory State counter reported at the end of each one-minute measurement interval, should help to clarify the state of VMware memory-managemen…