Performance By Design

Posts

Showing posts from April, 2012

Presenting two sessions at the upcoming UKCMG meeting in Oxford, England on May 14-15.

Some news that regular readers of this blog might be interested in hearing about... I plan to present two sessions at the upcoming UKCMG annual conference, which is being held this year on May 14 & 15 at the Oxford Belfry on the outskirts of Oxford, England. The first presentation is a repeat performance of the one I gave at the US CMG in December, a paper entitled “Measuring Processor Utilization in Windows and Windows applications,” essentially pulling together the series of blog entries I have been posting here, beginning with the first installment , but with a good deal more material than I have gotten around to posting to the blog. For instance, the last blog post discussing the high resolution clocks and timer facilities in Windows leads directly to a consideration of what happens to the various CPU utilization measurements when Windows is running as virtual guest under VMware or Hyper-V. That discussion is in the paper, but, unfortunately, hasn’t made it to t...

The mystery of the scalability of a CPU-intensive application on an Intel multi-core processor with HyperThreading (HT).

This blog entry comes from answering the mail (note: names and other incriminating identification data were redacted to protect the guilty): A Reader writes: “My name is … and I am a developer for a software company. I encountered something I didn’t understand profiling our software and stumbled across your blog. Listen, hot-shot, since you are such an expert on this subject, maybe you can figure this out. Our software is very CPU-intensive, so we use QueryPerformanceCounter() and QueryPerformanceFrequency() to get the CPU cycles. Hence, if someone mistakenly commits some change to slow it down, we can catch it. It works fine until one day we decided to fire up more than one instance of the application. Since a large part of our code is sequential and could not be parallelized, by firing up another instance, we can use the other cores that are idle when one core is executing the sequential code. We found the timing profile all messed up. The time difference can be 5x difference ...