Skip to main content

Understanding Guest Machine CPU Priority Scheduling options under Hyper-V

CPU Priority scheduling options.

In a final set of benchmark results on Guest machine performance when the physical CPUs on the Hyper-V Host are over-committed, we will look now at how effective the Hyper-V processor scheduling priority settings are at insulating preferred guest machines from the performance impact of an under-provisioned (or over-committed) Hyper-V Host machine. The results of two test scenarios where CPU Priority scheduling options were used are compared to the over-committed baseline (and the original native Windows baseline) are reported in the following table:


Configuration
#
guest
machines
CPUs
per machine
Best case elapsed time
stretch factor
Native machine
4
90
4 Guest machines (no priority)
4
2
370
4.08
4 Guests using Relative Weights
4
2
230
2.56
4 Guest using Reservations
4
2
270
3.00


Table 4. Test results when Virtual Processor Scheduling Priority settings are used.

As discussed earlier in this series blog posts, Hyper-V virtual processor scheduling options include allowing you to prioritize the workloads from guest machines that are resident on the same Hyper-V Host. To test the effectiveness of these priority scheduling options, I re-ran the under-provisioned 4 X 2-way guest machine scenario with two of the guest machines set to run at a higher priority, while the other two guests were set to run at a low priority. I ran separate tests to evaluate the virtual processor Reservation settings in one scenario and the use of relative weights in a second scenario.

CPU Scheduling with Reservations. 

For the Reservation scenario, the two high priority guest machines reserved 50% of the virtual processor capacity they were configured with. The two low priority guest machines reserved 0% of their virtual processor capacity. Figure 34 shows the Hyper-V Manager’s view of the situation – the higher priority machines 1 & 2 clearly have favored access to the Hyper-V logical processors. The two higher priority guests are responsible for 64% of the CPU usage, while the two low priority machines are consuming just 30% of the processor resources.

Figure 34. The Hyper-V Manager’s view of overall CPU Usage during the Reservation scenario. Together, the higher priority machines 1 & 2 are responsible for 64% of the CPU usage, while the two low priority machines are consuming just 30% of the CPU capacity.
The guest machines configured with high priority settings executed to completion in about 270 minutes (or 4 ½ hours). This was about 27% faster than the equally weighted guest machines in the baseline scenario where four guest machines executed the benchmark program without any priority settings in force.  

Figure 35 reports on the distribution of the Virtual Processor utilization for the four guest machines executing in this Reservation scenario during a one-hour period. Guest machines 1 & 2 are running with the 50% Reservation setting, while machines 3 & 4 are running with the 0% Reservation setting. Instead of the view in Figure 32 where each guest machine has equal access to virtual processors, the high priority guest machines clearly have favored access to virtual processors. Together, the 4 higher priority virtual processors consumed about 250% out of a total of 400% virtual processor capacity, almost twice the amount of residual processor capacity available to the lower priority guest machines.
Figure 35. Virtual Processor utilization for the four guest machines executing in the Reservation scenario.
Hours later when the two high priority guest machines finished executing the benchmark workload, those guest machines went idle and the low priority guests were able to consume more virtual processor capacity. Figure 36 shows these higher priority guest machines executing the benchmark workload until about 10:50 pm, at which point the Test 1 & 2 machines go idle and machines 3 & 4 quickly expand their processor usage.
Figure 36. The higher priority the Test 1 & 2 machines go idle about 10:50 pm, at which point machines 3 & 4 quickly expand their processor usage.
As Figure 36 indicates, even though the high priority Test machines 1 & 2 are idle, there virtual processors still get scheduled to execute on the Hyper-V physical CPUs. When guest machines do not consume all of the virtual processor capacity that is requested in a Reservation setting, that excess capacity evidently does become available for lower priority guest machines to use.

Figures 37 and 38 show the view of processor utilization available from inside one of the high priority guest machines. Figure 37 shows the view of the virtual hardware that the Windows CPU accounting function provides, plus it shows the instantaneous Processor Ready Queue measurements. These internal measurements indicate that the virtual processors are utilized near 100% and there is a significant backlog of Ready worker threads from the benchmark workload queued for the two virtual CPUs.

Figure 37. Internal Windows performance counters indicate that the virtual processors are utilized near 100%, with a significant backlog of Ready worker threads from the benchmark workload queued for the two virtual CPUs.

Figure 37 shows the % Processor Time counter from the guest machine Processor object, while Figure 38 (below) shows processor utilization for the top 5 most active processes, with the ThreadContentionGenerator.exe – the benchmark program – dominating, as expected.  

Figure 38. The benchmark program ThreadContentionGenerator.exe consumes all the processor cycles available to the guest machine.

CPU Scheduling with Relative Weights.

A second test scenario used Relative Weights to prioritize the guest machines involved in the test, leading to results very similar to the Reservation scenario. Two guest machines were given high priority scheduling weights of 200, while the other two guest machine were given low priority scheduling weights of 50. This is the identical weighting scheme discussed in an earlier post that described setting up CPU weights. Mathematically, the proportion of each virtual processor allocated for the higher priority guest machines was 80%, with 20% of the processor capacity allocated to the lower priority guests. In actuality, Figure 39 reports each high priority virtual processor consuming about 75% of a physical CPU, while the four lower priority virtual processors consumed slightly more than 20% of a physical CPU.

Since the higher priority guest machines were able to consume more processor time than in the Reservation scenario, the higher priority machines were able to complete the benchmark task in 230 minutes, faster than the best case in the Reservation scenario and about 38% faster than the baseline scenario where all four guests ran at the same Hyper-V scheduling priority.

Figure 39. In the Relative Weights scenario, each high priority virtual processor consumed about 75% of a physical CPU, while the four lower priority virtual processors consumed slightly more than 20% of a physical CPU.
As in the Reservation scenario, once the high priority guest machines completed their tasks and went idle, the lower priority guest machines gained greater access to the physical CPUs on the Hyper-V Host machine. This shift is highlighted in Figure 40, which shows the higher priority virtual processors for guest machines 1 & 2 tailing off at around 1:40 pm, which allows the processor usage from the lower priority virtual processors to take off at that point.

Figure 40. When the higher priority virtual processors for guest machines 1 & 2 finish processing 1:40 pm, the processor usage by the lower priority virtual processors accelerates. 
The CPU usage pattern in Figure 40 showing this shift taking place during the Relative Weights scenario shows some similarity to the Reservation scenario shown in Figure 36, with one crucial difference. With Reservations, Hyper-V still schedules the high priority virtual processors that are not active to execute, so they continue to consume some virtual processor execution time. Using Relative Weights, the virtual processors that are idle are not even scheduled to execute, so the lower priority guest machines get more juice. Comparing the best case execution times for the higher priority machines in the two priority scheduling scenarios, the Relative Weights scheme also proved superior.

All this is consistent with the way capacity reservation schemes trend to operate – so long as the high priority workload does not consume all of the capacity reserved for its use, some of that excess reserved capacity is simply going to be wasted. But also consider that the measure of performance that matters in these tests is throughput-oriented. If performance requirements are oriented instead around responsiveness, the CPU Reservation scheme in Hyper-V should yield superior results.

Next: Guest Machine performance monitoring

Having seen that the performance counters associated with virtual processor usage gathered by Windows guest machines running Hyper-V (or VMware ESX, for that matter) are supplanted by the performance usage data reported by the hypervisor, you may have noticed I have shown in several places where guest machine performance counters proved useful in understanding what is going on in the virtualized infrastructure. In the next series of blog posts, we will step back and see how guest machine performance counters are impacted by virtualization in more general terms. I will discuss discuss which performance measurements guest machines produce remain viable for diagnosing performance problems and understanding capacity issues. Depending on the type of performance counter, we will see the impact of the virtualization environment varies considerably.

Comments

Popular posts from this blog

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed in the previous post, there are few professional-grade, application response time monitoring and profiling tools that exploit the …

Virtual memory management in VMware: memory ballooning

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.


Ballooning
Ballooning is a complicated topic, so bear with me if this post is much longer than the previous ones in this series.

As described earlier, VMware installs a balloon driver inside the guest OS and signals the driver to begin to “inflate” when it begins to encounter contention for machine memory, defined as the amount of free machine memory available for new guest machine allocation requests dropping below 6%. In the benchmark example I am discussing here, the Memory Usage counter rose to 98% allocation levels and remained there for duration of the test while all four virtual guest machines were active.

Figure 7, which shows the guest machine Memory Granted counter for each guest, with an overlay showing the value of the Memory State counter reported at the end of each one-minute measurement interval, should help to clarify the state of VMware memory-managemen…

How Windows performance counters are affected by running under VMware ESX

This post is a prequel to a recent one on correcting the Process(*)\% Processor Time counters on a Windows guest machine.

To assess the overall impact of the VMware virtualization environment on the accuracy of the performance measurements available for Windows guest machines, it is necessary to first understand how VMware affects the clocks and timers that are available on the guest machine. Basically, VMware virtualizes all calls made from the guest OS to hardware-based clock and timer services on the VMware Host. A VMware white paper entitled “Timekeeping in VMware Virtual Machines” contains an extended discussion of the clock and timer distortions that occur in Windows guest machines when there are virtual machine scheduling delays. These clock and timer services distortions, in turn, cause distortion among a considerably large set of Windows performance counters, depending on the specific type of performance counter. (The different types of performance counters are described here