Skip to main content

Hyper-V Dynamic Memory options

In the last post in this series, I began discussing the Hyper-V dynamic memory, which is one of the main features that differentiates Microsoft's virtualization technology from VMware. This post continues to look at dynamic memory in Hyper-V.

Dynamic memory options

There are several more dynamic memory parameters that can also be set, including the amount of Guest Physical memory to grant the guest machine at start-up, a provision for a machine memory buffer for Hyper-V to maintain as a reserve, and a weighting factor for prioritizing the various guest machines in case demand for machine memory begins to exceed the supply. These guest machine settings are illustrated in the screen shot shown in Figure 11. 

Figure 11. Dynamic memory settings for a guest machine. The minimum Minimum RAM setting is 32 MB.
 Consider the dynamic memory configuration of the guest machine shown in Figure 11. The amount of RAM requested at startup in this example is 2 GB. If that amount of machine memory is not available, Hyper-V will simply not permit the guest machine to execute. The minimum machine memory is set to 512 MB and the maximum to 8 GB. The guest machine is allotted the amount of start-up memory needed initially, but afterwards Hyper-V can adjust the amount of machine memory allocated upwards or downwards, based on current conditions. The reason for setting the dynamic memory minimum for the guest machine to a value less than the Startup memory parameter is that it allows Hyper-V to remove memory from the guest machine if it becomes idle. The amount of start-up memory can only be changed when the virtual machine is not running, but the other dynamic memory settings can be changed at any time.

For a discretionary workload such as a developer or QA Test machine configured to run on the same Hyper-V Host machine as a higher priority production workload, especially one that is only active intermittedly, it is often desirable to configure a minimum amount of RAM for that kind of guest machine to a value that is significantly less than the amount of machine memory the guest is allotted at startup, as illustrated. Of course, if Hyper-V is allotted the freedom to reduce guest machine memory below the minimum amount that the guest requires, if there is memory pressure from other resident guest machines, Hyper-V will reduce its physical memory below the guest machine’s start-up memory setting. The performance of the guest machine will undoubtedly suffer for a brief period of time when it does become active again following the period when it was idle and it physical memory allotment was reduced. However, so long as machine memory is available, dynamic memory adjustments can occur quite rapidly should the guest become active again, as was apparent from Figure 9

For the sake of the higher priority production workloads that might benefit from expanding their machine memory allotments, a scheme that punishes the lower priority guests is simple to set up and often quite effective. But beware that the strategy can back-fire if a deliberately under-provisioned guest machines face excessive paging operations, which tends to manifest itself as a disk I/O bottleneck and (a) these under-provisioned guest machines are paging heavily to disk and (b) the disks are configured to point to hardware that is shared by other guest machines. Then the disk contention that the under-provisioned guests generate can easily spill over to the other guest machines that share that physical disk hardware. In this fashion, guest machines that are severely under-provisioned with regard to physical memory can generate performance problems that can impact the other guest machines that are sharing the Hyper-V Host.

Based on demand, the hypervisor in Hyper-V can adjust upwards the amount of physical memory granted to a guest machine by invoking the hot memory Add feature of the guest OS, something which Windows supports. To reduce the amount of physical memory available to a guest machine, Hyper-V uses a ballooning technique that is similar to the one that VMware pioneered in its virtualization platform. Ballooning does require a guest OS enlightenment, however, as discussed in more detail below. If the guest machine does not support hot memory add, Hyper-V has recourse only to removing pages from a guest machine.

Examples like Figure 9 and 10, which were shown in the previous post, demonstrate that dynamic memory adjustments to guest machines are performed more or less continuously at regular, periodic intervals, which permits a fresh reassessment of guest machine memory requirements after each adjustment. So long as the guest machine Memory Pressure value is substantially under 100, the hypervisor is free to remove physical memory using ballooning without much performance impact on even an active Windows guest.

Memory Priority

The Hyper-V component called the Dynamic Memory Balancer that makes dynamic memory adjustments periodically based on the Memory Pressure values also implements a priority scheme for the various guest machines that are being managed dynamically. Figure 12 illustrates the Dynamic Memory Balancer in action for four identical Windows guest machines, each configured with the default neutral memory priority. In the absence of memory priority, the overall effect of the Hyper-V Dynamic Memory Balancer adjusting the memory footprint of each guest is to balance and re-balance the size of the partitions, with the result that the Memory Pressure of each guest is roughly the same. The measurements shown in Figure 12 represent a two-hour period where the Hyper-V performance counters were gathered once per minute. Notice that Memory Pressure for each machine varies constantly, but remains (for the most part) confined to a narrow range between 60 and 80. Each guest machine is running the x64 .NET Framework benchmarking program I mentioned earlier. And, as noted before, the CLR garbage collector is reacting to changes in the amount of physical memory available in the guest machine in response to Low and High memory notifications from the Windows OS.

Figure 12. The Average Memory Pressure reported for four identically configured Windows guest machines, executing similar workloads. The Hyper-V Dynamic Memory Balancer is constantly adjusting machine memory allocation to keep each of the guest machines confined to the same narrow range of Memory Pressure values between 60 and 80, until demand for machine memory goes slack towards the end of the two-hour measurement interval.

This default balancing behavior changes when memory priority is set to favor adding physical memory higher priority guest machines and removing physical memory from lower priority guests. Instead of a single band that encloses the range of Memory Pressure readings you can observe, as in Figure 12, setting memory priority results in multiple bands, each one corresponding to a memory priority setting. This is illustrated in Figure 13, which shows the same four guest machines, but this time grouped using two Memory Priority settings.

Figure 13. Setting dynamic memory priority leads to multiple bands of Memory Pressure readings. Higher memory priority guest machines 1 & 2 report consistently lower Memory Pressure measurements. Lower memory priority guest machines 3 & 4 report consistently higher Memory Pressure measurements. Guest machine 5 is running at a neutral memory priority setting.
For the memory priority experiment documented in Figure 13, two guest machines (guests 1 & 2) were configured to run at the highest memory priority setting available while two identical guest machines were configured to run at the lowest memory priority (guests 3 and 4). I also introduced a fifth machine (guest 5) running at a neutral priority shortly after 3 pm in order to see the impact of increased contention for machine memory. The Memory pressure readings for the two machines running at a lower memory priority face greater memory pressure than the higher priority machines. Higher memory priority translates into lower Memory pressure values, although the pattern is not always so clearly discernable in real world workloads that exhibit greater variability in memory usage than the stable benchmark workloads illustrated here.

Memory Buffer

Another dynamic memory option pertains to the size of the buffer of free machine memory pages that Hyper-V maintains on behalf of each guest in order to speed up memory allocation requests. By default, the hypervisor maintains a memory buffer for each guest machine subject to dynamic memory that is 20% of the size of its grant of physical guest memory, but that parameter can be adjusted upwards or downwards for each guest machine. On the face of it, the Memory Buffer option looks useful for reducing the performance risks associated with under-provisioned guest machines. So long as the Memory Buffer remains stocked with free machine memory pages, operations to add physical memory to a child partition are able to be satisfied quickly. 

Figure 14. Hyper-V maintains Available Memory to help speed memory allocation requests.

A single performance counter is available that tracks overall Available Memory, which is compared to the average Memory Pressure in Figure 14. Intuitively, maintaining some excess capacity in the hypervisor’s machine memory buffer seems like a good idea. When the Available Memory buffer is depleted, in order for the Hyper-V Dynamic Memory Balancer to add memory to a guest machine, it first has to remove memory from another guest machine, an operation which does not take effect immediately. Generating an alert based on Available Memory falling below a threshold value of 3-5% of total machine memory is certainly appropriate. Unfortunately, Hyper-V does not provide much information or feedback to help you make adjustments to the tuning parameter and understand how effective the Memory Buffer is. 

Next: ballooning,


Popular posts from this blog

Inside the Windows Runtime, Part 2

As I mentioned in the previous post, run-time libraries in Windows provide services for applications running in User mode. For historical reasons, this run-time layer in Windows was always known as the Win32 libraries, even when these services are requested in the 64-bit OS in 32-bit mode. A good example of a Win32 run-time service is any operation that involves opening and accessing a file somewhere in the file system (or the network, or the cloud). A more involved example is the set of Win32 services an application needs to access to play an audio file, including understanding the specific audio file compressed format, and checking authorization and security.
For Windows 8, a portion of the existing Win32 services in Windows were ported to the ARM hardware platform.  The scope of the Win32 API is huge, and it was probably not feasible to convert all of it during the span of a single, time-constrained release cycle. Unfortunately, the fact that the new Windows 8 Runtime library encomp…

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed in the previous post, there are few professional-grade, application response time monitoring and profiling tools that exploit the …

Why is my web app running slowly? -- Part 1.

This series of blog posts picks up on a topic I made mention of earlier, namely scalability models, where I wrote about how implicit models of application scalability often impact the kinds of performance tests that are devised to evaluate the performance of an application. As discussed in that earlier blog post, sometimes the influence of the underlying scalability model is subtle, often because the scalability model itself is implicit. In the context of performance testing, my experience is that it can be very useful to render the application’s performance and scalability model explicitly. At the very least, making your assumptions explicit opens them to scrutiny, allowing questions to be asked about their validity, for example.
The example I used in that earlier discussion was the scalability model implicit when employing stress test tools like HP LoadRunner and Soasta CloudTest against a web-based application. Load testing by successively increasing the arrival rate of customer r…