Skip to main content

Posts

Showing posts from 2011

Measuring thread execution state using trace events.

Continuing the discussion from the previous blog entry on event-driven approaches to measuring CPU utilization in Windows...
Besides measuring processor utilization at the system level, the stream of context switch events can also be re-constructed to drill into CPU consumption at the process and thread level. An exemplary example of this approach is the Visual Studio Profiler’s Concurrency Visualizer, available in Visual Studio 2010. (For reference, see “Performance Tuning with the Concurrency Visualizer in Visual Studio 2010 in the Visual Studio 2010 Profiler,” an MSDN Magazine article written by the tool’s principal architect, Hazim Shafi.)The Concurrency Visualizer gathers Context Switch events to calculate processor utilization for the application being profiled. The VS Concurrency Visualizer creates a system-level CPU Utilization View with an interesting twist – the view pivots based on the application you are profiling, a perspective that matches that of a software performance…

Is there an ARM server in your future?

I cannot resist adding to some of the industry buzz about the recent HP Project Moonshot announcement and what this potentially means for folks that run Windows. As part of Project Moonshot, HP is planning to release something called the Redstone Server Development Platform in 1H12, an ARM-based, massively parallel supercomputer that uses 90% less energy than comparable Intel microprocessors. Coupled with the big.Little hardware announcement from the ARM Consortium about two weeks ago, I think this is big news that could shake up the foundations of Windows computing. ARM-based Windows Server machines could well be in all our futures. Let’s start with ARM and the big.Little architecture announcement, which is pretty interesting in itself. Details are from this white paper. big.Little is an explicit multi-core architecture where a simple, low power version of the processor is packaged together with a significantly more powerful (~ 2x) version of the processor that also uses about 3 times…

Using xperf to analyze CSwitch events

Continuing the discussion from the previous blog entry on event-driven approaches to measuring CPU utilization in Windows...
Last time around I discussed the same CPU busy calculations that the Resource Manager in Windows 6 & 7 makes. This same calculation can also be performed after the fact using the event data from ETW. This is the technique used in the Windows Performance Toolkit (WPT, but which is better known around Microsoft as xperf), for example, to calculate CPU usage metrics. Once you have downloaded and installed the Windows Performance Toolkit, you can launch a basic ETW collection session using the following xperf command: xperf -on DiagEasy Then, after you have accumulated enough data, issue another command to stop tracing and capture the event stream to a file: xperf -d cputrace.etl Next, process the cputrace.etl file using the xperfview app. After the trace file is loaded, xperfview provides visualizations that are very similar to ResMon. See Figure 6 for an example. Fi…

Measuring Processor Utilization in Windows and Windows applications: Part 2

An event-driven approach to measuring processor execution state.
The limitations of the legacy approach to measuring CPU busy in Windows and the need for more precise measurements of CPU utilization are recognized in many quarters across the Windows development organization at Microsoft. The legacy sampling approach is doubtless very efficient, and this measurement facility was deeply embedded in the OS kernel’s Scheduler facility, a chunk of code that is very risky to tamper with. But, for example, more efficient power management, something that is crucial for battery-powered Windows devices, strongly argues for an event-driven alternative. You do not want the OS to wake up from a low power state regularly on an idle machine just to perform its CPU usage accounting duties, for example.

A straightforward alternative to periodically sampling the processor execution state is to measure the time spent in each processor state directly. This is accomplished by instrumenting the phase state …

Measuring Processor Utilization in Windows and Windows applications: Part 1

Introduction. This blog entry discusses the legacy technique for measuring processor utilization in Windows that is based on sampling and compares and contrasts to other sampling techniques. It also introduces newer techniques for measuring processor utilization in Windows that are event-driven. The event-driven approaches are distinguished by far greater accuracy. They also entail significantly higher overhead, but measurements indicate this overhead is well within acceptable bounds on today’s high powered server machines.
As of this writing, Windows continues to report measurements of processor utilization based on the legacy sampling technique. The more accurate measurements that are driven using an event-driven approach are gaining ground and can be expected to supplant the legacy measurements in the not too distant future. While computer performance junkies like me positively salivate at the prospect of obtaining more reliable and more precise processor busy metrics, the event-d…

Deconstructing disk performance rules: final thoughts

To summarize the discussion so far: While my experience with rule-based approaches to computer performance leads me to be very skeptical of their ultimate value, I recognize they can be useful under many circumstances, especially if people understand their inherent limitations. For example, in the last couple of blog entries, I noted the usefulness of threshold rules for filtering the great quantities of esoteric performance data that can readily be gathered on Windows (and other computing platforms). The threshold rules implicitly select among the performance data to be gathered – after all, before you can perform the threshold test, you must first have acquired the data to be analyzed. Concentrating on measurement intervals where the threshold test succeeds also help you to narrow the search to periods of peak load and stress. However, the mere mechanical iteration over some set of expert-derived performance rules, absent the judgment of an experienced performance analyst, is unlikely…

“There’s a lot more to running a starship than answering a lot of fool questions.”

Continuing a series of blog posts on “expert” computer Performance rules, I am reminded of something Captain James T. Kirk, commander of the starship Enterprise, once said in an old Star Trek episode: “There’s a lot more to running a starship than answering a lot of fool questions.” Star Trek, The Original Series. Episode: The Deadly Years. Season 2, Episode 12. See http://tos.trekcore.com/episodes/season2/2x12/captioninglog.txt. For some reason, the idea that the rote application of some set of rules derived by a domain “expert” can suffice in computer performance analysis has great sway. At the risk of beating a dead horse, I want to highlight another example of a performance Rule you are likely to face, and, in the process, discuss why there is a whole lot more to applying it than might be obvious at first glance. There happens to be a lot more to computer performance analysis than the rote evaluation of some set of well-formed performance rules. It ought to be apparent by now that I …