Skip to main content

Measuring application response time using the Scenario instrumentation library.




This blog post describes the Scenario instrumentation library, a simple but useful tool for generating response time measurements from inside a Windows application. The Scenario instrumentation library uses QPC() and QTCT(), the Windows APIs discussed in an earlier blog entry, to gather elapsed times and CPU times between two explicit application-designated code markers. The application response time measurements are then written as ETW events that you can readily gather and analyze.
 You can download a copy of the Scenario instrumentation library here at http://archive.msdn.microsoft.com/Scenario.
The Scenario class library was originally conceived as a .NET Framework-flavored version of the Application Response Measurement (ARM) standard, which was accepted and sponsored by the Open Group. The idea behind ARM was that adding application response time measurements to an application in a standardized way would promote 3rd party tool development. This was moderately successful for a spell, ARM even developed some momentum, and was adopted by a number of IT organizations. Some major management tool vendors, including IBM's Tivoli suite and HP OpenView, supported ARM measurements in their tools.
In the Microsoft world, however, the ARM standard itself never stirred much interest, and application response time measurements are conspicuously absent from the performance counters supplied by many Microsoft-built client and server applications. However, many of these applications are extensively instrumented and can report response time measurements using ETW, which is one of the many reasons that something ARM-like for Windows should leverage ETW.  The Scenario instrumentation library tries to satisfy the same set of requirements as an ARM-compliant tool, but tailored to the Windows platform.
The topic is very much on my mind at the moment – thus this blog post – because I am working on a new tool for reporting on web application response times under Windows using ETW events generated by the HttpServer component of Windows (better known as IIS) and the TCP/IP networking stack. One of the early users of the Beta version of the tool also wanted a way to track application Scenario markers, so I am currently adding that capability. I expect to have an initial version of this reporting tool that will also support web applications instrumented with the Scenario library available next month, so…
Why measure application response times?
There are several very good reasons for instrumenting applications so that they can gather response time measurements. If you are at all serious about performance, it is well-nigh impossible to do good work without access to application response time measurements. If you don’t have a way to quantify empirically what is “good” response time and compare it to periods of “bad” response time, let’s face it, you are operating in the dark. Performance analysis begins with measurement data, and you cannot manage what you can’t measure.
Application response time measurements are important for two main reasons. The first is that application response time measurements correlate with customer satisfaction. In survey after survey of customers, performance concerns usually rank just below reliability (i.e., bugs and other defects) as the factor most influential in forming either a positive or negative attitude towards an application. They are a critical aspect of software quality that you can measure and quantify.
In performance analysis, application response time measurements are also essential to apply any of the important analytic techniques that people have developed over the years for improving application response time. These techniques include using Queuing Theory and related mathematical techniques used by capacity planners to predict response time in the face of growing workloads and changing hardware. Any form of optimization or tuning you want to apply to your application also needs to be grounded – how can you know if this or that optimization leads to an improvement if you are not measuring response times, both before and after. Even knowing which aspect of the application’s response time to target a tuning effort on requires measurements that allow you to break the response times you observe into their component parts – CPU, IO, network, etc., an analysis technique known as response time decomposition.
So, for these and other reasons, application response time measurements are extremely important. Which is why it is especially annoying to discover that application response time measurements are largely absent from the standard Windows performance counter measurements that are available for both native C++ and managed .NET Framework apps in Windows. The Scenario instrumentation library helps to address this gap in a standard fashion, similar to the ARM standard, and likewise enables the development of 3rd party reporting tools.
Tips for gettng your applications instrumented.
Adding ARM-like instrumentation to an application inevitably raises some concerns. The prime concern is that adding the library calls means you have to open up the code and modify it. Even if the instrumentation API is easy to use – and the Scenario class library is very simple – modifying the code is risky, riskier than doing nothing. It needs to be performed by someone who knows the code and will add the instrumentation carefully. A reluctance to open up the code and expose it to additional risk is usually the big initial obstacle organizations face when it comes to adding instrumentation – it is one more thing on the ToDo list that has to be coded and tested, and one more thing that can go wrong.
The best approach to overcoming this obstacle is to line up executive support for the initiative. Let’s face it, your IT management will appreciate receiving regular service level reports that accurate reflect application usage and response time. We all want satisfied customers, and meeting service objectives associated with availability, reliability and performance is highly correlated with customer satisfaction. Application response time data is critical information for IT decision makers.
The 2nd obstacle, which is actually the more serious one, is that someone has to figure out what application scenarios to instrument. In practice, this is not something that is technically difficult. It just requires some thought from the people who designed the application and care about its quality, and perhaps some input from the folks that use the app to understand what pieces of it they rely on the most. Technical input is also required at various points in the process of coming up with the scenario definitions – decisions about what scenarios to instrument need to be made in light of any technical obstacles that can arise.
Let me illustrate one of the technical considerations that will arise when you first consider instrumenting an application to report response times. You will discover rather quickly that reporting response times alone apart from some explanatory context leads to problems in interpretation. Let me illustrate with an example from a well-known and well-instrumented application you are probably familiar with – Google Search. At the top of the panel where Google displays Search results is a response time measurement. For example, I just typed in “search engines” and Google Search returns the information that it knows of 264,000,000 “results” that match that search criteria. Google then reports that it required all of 0.25 seconds to find that information for me and report back. The 250 milliseconds measure is the response time and the 264 million results is the context needed to interpret whether that response time is adequate.
When you instrument your application using the Scenario class library, you have two additional fields that can be used to report context, a 64-bit integer Property called Size and string Property called Category. If you were instrumenting Google Search using calls to the Scenario class, you would set the Size Property to the 260 million results value and place the search keywords into the Category string. The Size and Category Properties that are reported alongside the response time measurement provide context to assist you in interpreting whether or not the response time result the application supplied was adequate in that particular instance.
So, one final aspect of instrumenting your application to consider is what additional context you want to supply that will aid you in interpreting the response time measurements after the fact. The contextual data that is usually most helpful is associated with what I like to call the application’s scalability model. The application’s scalability is your theory about the factors that have the most influence over the application’s response time. If, for example, you think that how many rows in the database the application must process has something to do with how long it takes to compute a result and generate a response message, that conjecture reflects a simple scalability model
                f(x) = y * rowsn  
Populating the Scenario object’s Size and Category Properties from data relevant to the application’s scalability model helps provide the context necessary for interpreting whether a specific response time measurement that was reported is adequate or not.
It may be challenging to squeeze all the data relevant to the application’s scalability model into the limited space the Scenario class provides for customization. In practice, many adopters turn the Category string into a composite value. That’s something I often resort to myself. Consequently, in my reporting program I support a standard method for packing data into the Category string value, which is automatically broken out in the reporting process. Worst case is that you will never be able to shoehorn all the contextual data needed into the Scenario class’s simple Size and Category Property fields. When that happens, you will need to develop your own instrumentation class – and your own reporting.
Software performance engineering and application development lifecycle
Given how fundamentally important application response time measurements to any worthwhile program for maintaining and improving software quality, it is worth thinking a bit about why this critical data is so often missing. Let’s consider the full application development life cycle – from gathering requirements, to design, coding, testing, acceptance and stress testing, to deployment and ongoing “maintenance” activities. Too often, the application’s performance requirements are relegated to a narrow window somewhere towards the end of the development process but presenting a significant hurdle during acceptance testing. When performance testing is concentrated in this fashion late in the acceptance testing stage, this positioning is almost guaranteed to cause resentment among the hard-pressed software developer staff way. Performance requirements should actually be set early in the requirements process as scenarios targeted for development are being defined. Provisioning the application so that it can report on the response times of those key scenarios emphasizes performance considerations at every stage of the application development lifecycle.
In principle, the application scenarios are specified early in the development life cycle, and early in the cycle is also the best time to begin thinking about response time requirements. In the software development methodologies that are in fashion, application performance is usually viewed as a “non-functional” requirement, one that does not get a lot of attention. This is all wrong, of course. As one of my colleagues was fond of telling our CIO, performance isn’t a coat of paint you can slap on an application after it is done that will beautify it. Decisions about how to structure the application made during the design stage often determine what application response times are even achievable.
On the contrary, performance needs to be considered at every stage of application development. Performance goals for key application scenarios need to be defined early in the design stage. Instrumentation to measure application response time allows developers to assess their progress accurately in meeting these goals at every stage of the process. Instrumentation embedded in the application also aids in performance testing. My experience is that with the right level of instrumentation, every functional test can also serve as a performance test.
While application response time measurements are the closest we can get to understanding and quantifying the end user experience, it is worth noting that the correlation between response time and customer satisfaction is typically not a simple linear relationship. Human beings are a little more complicated than that. If you want a set of relatively simple guidelines to help you decide what response times are good for your application and which are bad, I recommend Steve Seow’s book on the subject “Designing and Engineering Time: The Psychology of Time Perception in Software.” The guidelines in Dr. Seow’s book are grounded in the relevant Human-Computer Interaction (HCI) psychological research, but it is not a dry, academic discussion.
Steve’s book also promotes a set of prescriptive techniques for engineering a better user experience whenever the processing requirements associated with a request are too demanding to produce a prompt application response. For instance, by managing a customer’s expectations about how long some longer running operation is potentially going to take, you can engineer a highly satisfying solution without blistering response times. Showing a progress bar that accurately and reliably reflects the application’s forward progress and providing ways for a customer to cancel out of a long running task that ties up her machine are two very effective ways to create a positive experience for your customers when it is just not possible to complete the computing task at hand quickly enough.
Using the Scenario instrumentation library.
The Scenario instrumentation class library provides a convenient way for a developer to indicate in the application the beginning and end of a particular usage scenario. Internally, the Scenario instance uses an ExtendedStopwatch() object to gather both wall clock time (using QueryPerformanceCounter) and the CPU ticks from QTCT() for the Scenario when it executes. The Scenario class can then output these measurements in an ETW trace event record that renders for posterity the elapsed time and CPU time of the designated block of code.
The Scenario class wraps calls to an internal ExtendedStopwatch object that returns both the elapsed time and CPU time of a demarcated application scenario. Once a Scenario object is instantiated by an application, calls to Scenario.Begin() and Scenario.End() are used to mark the beginning and end of a specific application scenario. After the Scenario.End() method executes, the program can access the object’s Elapsed and ElapsedCpu time properties. In addition, the Scenario.Begin() and Scenario.End() methods generate ETW events that can be post-processed. The payload of the ETW trace event that is issued by the Scenario.End() method reports the elapsed time and CPU time measurements that are generated internally by the class.
To support more elaborate application response time monitoring scenarios, there is a Scenario.Step method that provides intermediate CPU and wall clock timings. The Scenario class also provides a correlation ID for use in grouping logically related requests. Nested parent-child relationships among scenarios that are part of larger scenarios are also explicitly supported. For details, see the API and ETW payload documentation pages on the MSDN archive.
I developed an earlier version of the Scenario instrumentation library in conjunction with several product development teams in the Developer Tools Division when I was at Microsoft. An early adopter was a product team attempting to build a next-generation C++ compiler. The test case they were looking to optimize was the commercial Windows Build, a set of very demanding and long running compiler jobs. This product team gave the original Scenario instrumentation library quite a stress test, and I added several features to the library to make it more efficient and effective in that challenging environment.
Subsequently, what was effectively the Scenario instrumentation library version 2.0 was included in the commercial version of Visual Studio 2010, with instrumentation added for several newly developed components that shipped in the product. The Visual Studio 2010 flavor of the Scenario instrumentation library is known as the MeasurementBlock class. If you have Visual Studio 2010 installed, you can incorporate this into your application by referencing the Microsoft.VisualStudio.Diagnostics.Measurement.dll that is located at C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE. After adding a Reference to Microsoft.VisualStudio.Diagnostics.Measurement.dll, you can verify that the semantics of the MeasurementBlock class are functionally identical to the published Scenario library using the Object Browser applet in Visual Studio, as illustrated in the screen shot shown in Figure 1 below.
Figure 1. MeasurementBlock class imported from Microsoft.VisualStudio.Diagnostics.Measurement.dll as viewed in the Visual Studio Object Browser.
 
You will note some cosmetic differences, though. The eponymous Mark() method in the original implementation was renamed to Step() in the subsequent Scenario library. (Ever since I found out what it meant a few years back, I have always wanted to use the word “eponymous” in a sentence! J) MeasurementBlock is uses a different ETW Provider GUID; we didn't want developers adding Scenario markers to their apps suddenly seeing the VS MeasurementBlock events, too, when they enabled the ETW Provider.
Using the Scenario library in your application is straightforward and is documented here, but I will provide some additional coding guidance for use in interfacing to the new performance tool I am working on in the next blog entry.

Comments

Popular posts from this blog

Hyper-V Architecture: Intercepts, interrupts and Hypercalls

Intercepts, interrupts and Hypercalls Three interfaces exist that allow for interaction and communication between the hypervisor, the Root partition and the guest partitions: intercepts, interrupts, and the direct Hypercall interface. These interfaces are necessary for the virtualization scheme to function properly, and their usage accounts for much of the overhead virtualization adds to the system. Hyper-V measures and reports on the rate these different interfaces are used, which is, of course, workload dependent. Frankly, the measurements that show the rate that the hypervisor processes interrupts and Hypercalls is seldom of interest outside the Microsoft developers working on Hyper-V performance itself. But these measurements do provide insight into the Hyper-V architecture and can help us understand how the performance of the applications running on guest machines is impacted due to virtualization. Figure 3 is a graph showing these three major sources of virtualization overhead...

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed  in the previous post , there are few professional-grade, application response time monitoring and profi...

Memory Ballooning in Hyper-V

The previous post in this series discussed the various Hyper-V Dynamic Memory configuration options. Ballooning Removing memory from a guest machine while it is running is a bit more complicated than adding memory to it, which makes use of a hardware interface that the Windows OS supports. One factor that makes removing memory from a guest machine difficult is that the Hyper-V hypervisor does not gather the kind of memory usage data that would enable it to select guest machine pages that are good candidates for removal. The hypervisor’s virtual memory capabilities are limited to maintaining the second level page tables needed to translate Guest Virtual addresses to valid machine memory addresses. Because the hypervisor does not maintain any memory usage information that could be used, for example, to identify which of a guest machine’s physical memory pages have been accessed recently, when Guest Physical memory needs to be removed from a partition, it uses ballooning, which transfe...