Skip to main content

Performance Rules!

Around the time that Odysseus Pentakalos and I were writing our original book (the Windows 2000 Performance Guide from O’Reilly), there were already several books in print that provided guidance on Windows NT performance topics. (Internally, Windows 2000 is Windows NT version 5.0, while the current Windows 7 OS is version 6.1). In my travels, I had read several of these, along with almost every technical article on the subject I could get my hands on.
While these Windows performance books all had some merit, I also found they had serious shortcomings, in my less than humble opinion. Unfortunately, none of them were written with the benefit of understanding the Windows operating system from the inside out, which was largely a black box until the publication of David Solomon’s original “Inside Windows NT” in 1998. (You can check out the review of the Solomon book I wrote for Amazon almost immediately after it was published here.)
Moreover, none of those early books relied on a systematic approach to computer performance that gathered measurement data and analyzed rigorously under a variety of conditions. Only by using an empirical approach – computer science, to the extent it can be considered an actual “science” and not just an engineering discipline is an empirical one – could systematically & reliably determine how a Windows machine actually behaved when it was under stress or what would happen to it when you tried tweaking one of its many (often hidden) performance-oriented configuration options. Carefully gathering measurements of repeatable benchmarks run under varied conditions and analyzing the results is one of the cornerstones of the empirical approach I pursue.
I had the naïve notion that someone interested in this esoteric subject matter would be willing to invest the time and effort necessary to understand it in sufficient depth. But I found that some readers were disappointed that the book did not contain enough simple recipes – short cuts and other step-by-step procedures that could be followed by rote that were guaranteed paths to success. When I wrote the second book, I made an extra effort to address this criticism, which struck me as a legitimate Reader reaction to the book that I had written. As much as I tried to adhere to Occam’s Razor in writing it, the book was short on simple recipes. We intended it as a guide book, not a cookbook. But I could appreciate that some Readers had bought the book because they faced critical performance problems that they were hoping to get practical advice on how to fix. Naturally, these Readers might become frustrated when they did not discover simple solutions to their problems.
The challenge, of course, is that I am not sure there are too many simple recipes for success in this field.
In the 2nd book, I tried to be much more explicit about the empirical approach to diagnosing computer performance problems. I tried to communicate clearly that the cookbook approach with simple recipes anyone could follow often would not suffice. (Sometimes, it is all about managing expectations. J) In addition, I tried to include more concrete examples that I could discuss in detail, case studies that illustrated, methodically, step by step, a systematic, empirical approach. And I worked harder to formulate what crisp rules and recommendations I could, identifying those patterns and Best Practices in data collection, analysis, configuration and tuning that I thought were worthy.
Having learned something from writing the 1st book, the 2nd book was hopefully an improvement. But I can still imagine that some Readers of the 2nd book, which being part of the official Windows Server 2003 documentation set, circulated much more widely, were still frustrated to find fewer simple recipes for success than they had hoped.
So, while I am certainly sympathetic to the desire that many people have to purchase a set of concise prescriptions for success distilled into a Windows Performance Cookbook, obviously, I have been unable to produce one. This is not for lack of trying because I am sure that I could sell considerably more copies of book entitled “Windows Performance for Dummies” than the books I did write. It is because there are formidable obstacles to producing a decent, worthwhile book of fail-safe recipes.
Rule-based expert systems.
Let me take a minute and explain. One popular cookbook-like approach to performance attempts to encapsulate the knowledge of expert practitioners into declarative rules. These rules selectively analyze some measurement data and test it against some threshold value. An experienced practitioner in the problem domain selects both what data to look at, how to look at it (i.e., summarized, calculate a ratio between two values, look for a consistent linear relation between two values by calculating a correlation coefficient, etc.), and what values to use in the threshold tests. Programmatically, the rule is then evaluated as either unambiguously True or False in the current context. Computer programs that execute along these lines are known as expert systems.
Back when I was a grad student working on a degree in Computer Science, there were high hopes in artificial intelligence (AI) for expert systems. One of the more appealing aspects of the expert system approach was that you would not have to do much custom programming; some generic rules processing engine could do the bulk of the heavy lifting. What you would need instead was a knowledge engineer capable of encoding the domain-specific rules that an expert diagnostician followed. Once that expert knowledge was encapsulated in a set of declarative rules, a separate Rules engine would effectively be able to replicate those diagnostic procedures.
After I got my degree and starting developing software that did computer performance analysis, I was among those that wanted to see to what extent these AI techniques could be successfully adapted to this problem domain. One thing that was clear was that performance analysts had access to huge amounts of diagnostic and other measurement data to sift through. Not having enough data wasn’t our problem, as it might be in medical diagnosis, another problem domain where people were hoping expert systems might help. Computer performance analysts were swimming in it. Building software that could automatically analyze that data and generate suggestions about how to act on it would be very helpful.
At the time I entered the field, there were already many experienced practitioners using tools like SAS to process and analyze copious amounts of computer performance measurement data. There were tools to build Performance Data Bases (PDBs), repositories for all this measurement data where you could track growth and detect changes over time. There was undoubtedly some rich ore here, if we could only figure out how to mine it. Vendor-developed tools that massaged this measurement data into a form where analytic queuing modeling techniques could be applied were also in widespread use. (I worked for some of these tool vendors.) These analytic queuing models provided a “deep” understanding of the scalability behavior of complex computer systems, offering valuable predictive capabilities.
At the time I also encountered many self-appointed “experts” proposing rules that defined both desirable and undesirable run-time computer system characteristics. This was commonly known as the “Rule of Thumb” (ROT) approach to diagnosing performance problems, to distinguish it, I suppose, from more, precise analytic approaches, much as seafaring explorers needed to use dead reckoning instead of precise navigation techniques before the technology to build accurate seagoing clocks was available. A problem that arises almost immediately is that encapsulating these rough-hewn Rules of Thumb into rule definitions that could be processed by some AI-derived Rules engine requires that they be stated with precision. In specifying these rules to be executed by some computer program, they need to be precise. There is no way for the expert system to play a hunch or rely on intuition. (In theory, at least, this mechanical process of rule evaluation was what experts did to arrive at a decision, and computers could mimic that behavior. That some amount of mathematical-logical analysis of relevant data is a component of an expert’s decision-making process is probably true. But in rule-based expert systems, this component is the entire decision process. As an aside, I am not convinced that augmenting the rules with some combination of fuzzy logic and/or bayesian inference to try to deal with the uncertainty inherent in many problem domains helps all that much. In the problem domain that I know – which is computer performance analysis – I know it doesn’t help that much.)
Many of these useful Rules of Thumb resisted being rendered with enough precision that they could be evaluated programmatically by an Expert System’s rules engine. When you tried to pin them down to a precise logical formulation, many of the ROTs postulated by the reigning domain experts, these rules incorporated so many additional conditions and qualifying predicates that I soon developed my own (tongue-in-cheek) Rule of Thumb characterizing them as largely unhelpful and, in some cases, even downright dangerous to apply. The ROT I formulated to characterize the adequacy of a diagnosis based on a ROT firing is as follows:
1.       In evaluating the precise True/False value for the Rule, if the number of predicates qualifying the conditions under which the rule applies exceeds the number of predicates in the body of the Rule by a factor of 5, then the Rule itself should be discarded.
When enforced, Friedman’s Rule on performance Rules eliminates many of the rules proposed by the leading computer performance experts. Unfortunately, for the sake of the rule-based approach, “It depends” is frequently the correct answer to most queries that ask if a measurement that exceeds some postulated threshold value is a valid indicator of a related performance problem. Friedman’s Rule on performance rules suggests that any rule that is so over-burdened and pre-conditions and post-mortems other qualifiers is probably not a useful rule. When there are so many reasons why the rule won’t work, it is not that reasonable a rule. (Lot of puns there, but you get the idea.)
In the next blog, I will give an example of a simple performance rule and then drill into some of qualifications and conditions that you have to look for before it is at all reasonable to apply the rule.

Comments

  1. Mark, Welcome back to the PM World.I am looking forward to your next blog entry. As you can see, even Germany is reading .....

    ReplyDelete
  2. Mark - It's great to see you writing on Performance again and this is a great topic.

    So many of the performance products that are available today take the Pareto Principal path to quick and easy rules establishment. If the rule will work or solve a problem 80% of the time it can be added as a tool tip within a product, thereby eliminating the need for further investigation.

    ReplyDelete

Post a Comment

Popular posts from this blog

Hyper-V Architecture: Intercepts, interrupts and Hypercalls

Intercepts, interrupts and Hypercalls Three interfaces exist that allow for interaction and communication between the hypervisor, the Root partition and the guest partitions: intercepts, interrupts, and the direct Hypercall interface. These interfaces are necessary for the virtualization scheme to function properly, and their usage accounts for much of the overhead virtualization adds to the system. Hyper-V measures and reports on the rate these different interfaces are used, which is, of course, workload dependent. Frankly, the measurements that show the rate that the hypervisor processes interrupts and Hypercalls is seldom of interest outside the Microsoft developers working on Hyper-V performance itself. But these measurements do provide insight into the Hyper-V architecture and can help us understand how the performance of the applications running on guest machines is impacted due to virtualization. Figure 3 is a graph showing these three major sources of virtualization overhead...

Memory Ballooning in Hyper-V

The previous post in this series discussed the various Hyper-V Dynamic Memory configuration options. Ballooning Removing memory from a guest machine while it is running is a bit more complicated than adding memory to it, which makes use of a hardware interface that the Windows OS supports. One factor that makes removing memory from a guest machine difficult is that the Hyper-V hypervisor does not gather the kind of memory usage data that would enable it to select guest machine pages that are good candidates for removal. The hypervisor’s virtual memory capabilities are limited to maintaining the second level page tables needed to translate Guest Virtual addresses to valid machine memory addresses. Because the hypervisor does not maintain any memory usage information that could be used, for example, to identify which of a guest machine’s physical memory pages have been accessed recently, when Guest Physical memory needs to be removed from a partition, it uses ballooning, which transfe...

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed  in the previous post , there are few professional-grade, application response time monitoring and profi...