Skip to main content

Hyper-V Performance: Virtual Processor Scheduling Priority

Virtual Processor Scheduling Priority

The Hyper-V hypervisor, which is responsible for guest machine scheduling, by default implements a simple, round-robin policy in which any virtual processor in the execution state is equally likely to be dispatched. If the CPU capacity of the hardware is adequate, defining and running more virtual processors than logical processors leads to the possibility of dispatching queuing delays, but normally these dispatching delays are minimal because many virtual processors are in the idle state, and so over-commitment of the physical processors often does not impact performance greatly. On the other hand, once these dispatching delays start to become significant, the priority scheduling options that Hyper-V provides can become very useful.

In addition to the basic round-robin scheme, the hypervisor also factors processor affinity and the NUMA topology into scheduling decisions, considerations that add asymmetric constraints to processor scheduling. Note that the guest machine’s current CPU requirements are not directly visible to the Hyper-V hypervisor, so its scheduling function does not know when there is big backlog of ready threads delayed inside the guest. With the single exception of an OS enlightenment in Windows guests to notify Hyper-V via a Hypercall that it is entering a long spinlock, Hyper-V has no knowledge of guest machine behavior to make better informed scheduling decisions. Nor does the fact that a virtual processor has one or more interrupts pending bias a guest machine that is currently idle and improve its position in the dispatching queue. These are all factors that suggest it is not a good idea to try and push the CPU load to the limits of processor capacity under Hyper-V.

When enough guest machine virtual processors attempt to execute and the physical machine’s logical processors are sufficiently busy, performance under Hyper-V will begin to degrade. You can then influence Hyper-V guest machine scheduling using one of the available priority settings. The CPU priority settings are illustrated in Figure 2, which is a screenshot showing the Hyper-V Processor Resource control configuration options that are available. Hyper-V provides several tuning knobs to customize the Hyper-V virtual processor scheduler function. These include:
  • reserving CPU capacity in advance on behalf of a guest machine,
  • setting an upper limit to the CPU capacity a guest machine can use, and
  • setting a relative priority (or weight) for this virtual machine to be applied whenever there is contention for the machine’s processors 

Let’s compare the processor reservation, capping and weighting options and discuss which make sense to use and when to use them.


 Reservations. 

The Hyper-V processor reservation setting is specified as the percentage of the capacity of a logical processor to be made available to the guest machine whenever it is scheduled to run. The reservation setting applies to each virtual processor that is configured for the guest. Setting a processor reservation value is mainly useful if you know that a guest machine requires a certain, minimal amount of processing power and you don’t want Hyper-V to schedule that VM to run unless that minimum amount of CPU capacity is available. When the virtual processor is dispatched, reservations guarantee that it will receive the minimum level of service specified.

Hyper-V reservation work differently from many other Quality of Service (QoS) implementations that feature them where any capacity that is reserved, but not used, remains idle. In Hyper-V, when the guest machine does not consume all the capacity that it is reserved for it, virtual processors from other guest machines that are waiting can be scheduled instead. Implementing reservations in this manner suggests the hypervisor makes processor scheduler decisions on a periodic basis, functionally equivalent to a time-slice, but whatever that time-slicing interval is, it is undocumented.

Capping. 

Setting an upper limit using the capping option is mainly useful when you have a potentially unstable workload, for example, a volatile test machine, and you don’t want to allow that rogue guest to dominate the machine to the detriment of every other VM that is also trying to execute. Similar to reservations, capping is also specified as a percentage of logical processor capacity.

Weights. 

Finally, setting a relative weight for the guest machine’s virtual processors is useful if you know the priority of a guest machine’s workload relative to the other guest machine workloads that are resident on the Hyper-V host. This is the most frequently used virtual processor priority setting. Unlike reservations, weights do not provide any guarantee that a guest machine’s virtual processor will be dispatched. Instead they are simply used to increase or decrease the likelihood that a guest machine’s virtual processor is dispatched. As implemented, weights are similar to reservations, except that it is easier to get confused with weights about what percentage of a logical processor you intend to allocate. 

Here is how the weights work. Each virtual processor that a guest machine is eligible to use gets a default weight of 100. If you don’t adjust any of the weights, each virtual processor is equally as likely to be dispatched as any other, so the default scheme is the balanced, round-robin approach. In default mode (i.e., round-robin), the probability that a virtual processor is selected for execution is precisely
1⁄(# of eligible guest machines)

Notice that the number of virtual processors defined per machine makes the guest eligible to run more virtual processors, but is otherwise not a factor in calculating the dispatching probability for any one of its virtual processors. 

When you begin to adjust the weights of guest machines, scheduling decisions become based on the relative weighting factor you have chosen. To calculate the probability that a virtual processor is selected for execution when the weighting factors are non-uniform, calculate a base value that is the total for all the guest machine weighting factors. For instance, if you have three guest machines with weighting factors of 100, 150 and 250, the base value is 500, and the individual guest machine dispatching probabilities are calculated using the simple formula
weighti / Σ weight1-n

So, for that example, the individual dispatching probabilities are 20%, 30% and 50%, respectively.

Relative weights make it easier to mix production and test workloads on the same Hyper-V host. You could boost the relative weight of production guests to 200, for example, while lowering the weight of the guest machines used for testing to 50. If you have a production SQL Server guest machine that services the production servers, you would then want to set its weight higher than the other production guests. And, if you had a SQL Server guest test machine that services the other test guests, you could leave that machine running at the default weight of 100, higher than the other test machines, but still lower than any of the production guest machines. 

CPU Weight example. Consider a Hyper-V Host with four guest machines, two production guests, while the remaining two guest are test machines running a discretionary workload. To simplify matters, let’s assume each guest machine is configured to run two virtual processors. The guest machine virtual processors weights are configured as follows:

Guest Machine Workload
VP Weight
Production
200
Test
50

Since Hyper-V guest machine scheduling decisions are based on relative weight, you need to sum the weights over all the guest machines and compute the total weighting factor per logical processor. Then Hyper-V calculates the relative weight per logical processor for each guest machine and attempts to allocate logical processor capacity to each guest machine proportionately.

Table 3. Calculating the guest machine relative weights per logical processor.


Workload
# guests
VP Weight
Total Weight (Weight * guests)
Guest Relative Weight
per LP
Production
2
200
400
80%
Test
2
50
100
20%
Totals
4

500


Under those conditions, assuming 100% CPU utilization and an equivalent CPU load from each guest, you can expect to see the two production machines consuming about 80% of each logical processor, leaving about 20% of a logical processor for the test machines to share. A little later in this chapter, we will look at benchmark results using this CPU weighting scenario.

If you are mixing production and test machines on the same Hyper-V host, you may want to also consider an extra bit of insurance and use capping to set a hard upper limit on the amount of CPU capacity that the test machines can ever use. Keep in mind that none of these resource controls has much effect unless there is ample contention for the processors, which is something to be avoided in general under any form of virtualization. The Hyper-V processor resource controls come in handy when you pursue an aggressive server consolidation program, and want to add a little extra margin of safety to the effort.

Reservations and limits are commonly used in Quality of Service mechanisms, and weights are often used in task scheduling. However, it is a little unusual to see a scheduler that implements all three tuning knobs. System administrators can use a mix of all three settings for the guest machines, which is confusing enough. And, for maximum confusion, you can place settings on different machines that are basically incompatible with the settings of the other resident guest machines. So, it is wise to exercise caution when using these controls. Later, in this chapter I will report on some benchmarks that I ran where I tried some of these resource control settings and analyzed the results. When the Hyper-V Host machine is running at or near its CPU capacity constraints, these guest machine dispatching priority settings do have a significant impact on performance.


Comments

Popular posts from this blog

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed in the previous post, there are few professional-grade, application response time monitoring and profiling tools that exploit the …

Why is my web app running slowly? -- Part 1.

This series of blog posts picks up on a topic I made mention of earlier, namely scalability models, where I wrote about how implicit models of application scalability often impact the kinds of performance tests that are devised to evaluate the performance of an application. As discussed in that earlier blog post, sometimes the influence of the underlying scalability model is subtle, often because the scalability model itself is implicit. In the context of performance testing, my experience is that it can be very useful to render the application’s performance and scalability model explicitly. At the very least, making your assumptions explicit opens them to scrutiny, allowing questions to be asked about their validity, for example.
The example I used in that earlier discussion was the scalability model implicit when employing stress test tools like HP LoadRunner and Soasta CloudTest against a web-based application. Load testing by successively increasing the arrival rate of customer r…

Virtual memory management in VMware: memory ballooning

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.


Ballooning
Ballooning is a complicated topic, so bear with me if this post is much longer than the previous ones in this series.

As described earlier, VMware installs a balloon driver inside the guest OS and signals the driver to begin to “inflate” when it begins to encounter contention for machine memory, defined as the amount of free machine memory available for new guest machine allocation requests dropping below 6%. In the benchmark example I am discussing here, the Memory Usage counter rose to 98% allocation levels and remained there for duration of the test while all four virtual guest machines were active.

Figure 7, which shows the guest machine Memory Granted counter for each guest, with an overlay showing the value of the Memory State counter reported at the end of each one-minute measurement interval, should help to clarify the state of VMware memory-managemen…