Skip to main content

Memory Ballooning in Hyper-V

The previous post in this series discussed the various Hyper-V Dynamic Memory configuration options.


Removing memory from a guest machine while it is running is a bit more complicated than adding memory to it, which makes use of a hardware interface that the Windows OS supports. One factor that makes removing memory from a guest machine difficult is that the Hyper-V hypervisor does not gather the kind of memory usage data that would enable it to select guest machine pages that are good candidates for removal. The hypervisor’s virtual memory capabilities are limited to maintaining the second level page tables needed to translate Guest Virtual addresses to valid machine memory addresses. Because the hypervisor does not maintain any memory usage information that could be used, for example, to identify which of a guest machine’s physical memory pages have been accessed recently, when Guest Physical memory needs to be removed from a partition, it uses ballooning, which transfers the decision about which pages to remove from memory to the guest machine OS, which can execute its normal page replacement policy.

Ballooning was pioneered in VMware ESX , and the Hyper-V implementation is similar, but with some key differences. One key difference is that the Hyper-V hypervisor has no ability to ever remove guest physical memory arbitrarily and swap it to a disk file, as VMware ESX does when it faces an acute shortage of machine memory. VMware ESX swapping selects pages at random for removal, and without any knowledge of how guest machine pages are used, the hypervisor can easily choose badly. The Microsoft Hyper-V developers chose not to implement any form of hypervisor swapping of machine memory to disk. For page replacement, Hyper-V relies solely on the virtual memory management capabilities of the guest OS, which is usually Windows, when there is a shortage of machine memory. Frankly, performance suffers under either approach when there is an extreme machine memory shortage – overloading machine memory is something to be avoided on both virtualization platforms. Hyper-V does have the virtue that machine memory management is simpler, relying on a single mechanism to relieve a machine memory shortage.

In both virtualization approaches, it is important to be able to understand the signs that the VM Host machine’s memory is over-committed. In Hyper-V, these include:

  • a shortage of Hyper-V Dynamic Memory\Available Memory
  • sustained periods where the Hyper-V Dynamic Memory\Average Memory Pressure measurements for one or more guest machines hovers near 100
  • internal guest machine measurements show high paging rates to disk (Memory\Pages/sec, Memory\Page-ins/sec)

As ballooning transfers the decision about which pages to remove from guest physical memory to the guest OS, we need to revisit virtual memory concepts briefly in this new context. One goal of virtual memory management is to utilize physical memory efficiently, essentially filling up physical memory completely, aside from a small buffer of unallocated physical pages that are kept in reserve. Memory over-commitment works because processes frequently allocate more virtual memory than they need at any one moment in time. Consequently, it is usually not necessary to back every allocated virtual memory page with guest physical memory. Consider a guest machine that reports a Memory Pressure reading of 100 – in other words, its Committed Bytes = Visible Physical Memory. Typically, 10-20% of the machine’s committed pages are likely to be relatively inactive, which would allow the OS to remove them from physical memory without much performance impact.

Because virtual memory management tends to fill up physical memory, it is not uncommon for the OS to need to displace a currently allocated virtual page from physical memory to make room for a new or non-resident page that the process has just referenced from time to time. Windows implements an LRU page replacement policy, trimming older pages from process working sets when physical memory is in short supply. Windows and Linux guest machines manage virtual memory dynamically, keeping track of which of an application’s virtual pages are currently being accessed. Furthermore, the OS’s page replacement policy ages allocated virtual memory pages that have not been referenced in the current interval. The pages of a process that have not been referenced recently are usually better candidates for removal in favor of current pages.

The ballooning technique used in Hyper-V – and in VMware ESX, as well –  pushes the decision about which specific pages to remove down to the guest machine, which is in a far better position to select candidate pages for removal because the guest OS does maintain memory usage data. The term “ballooning” refers to a management thread running inside the guest machine that acquires empty physical memory buffers when the hypervisor signals that it wants to remove physical memory from the partition. This action can be thought of as the memory balloon inflating. Having once inflated, when Hyper-V decides to add memory back to the child partition, it deflates the balloon, freeing up balloon memory that was previously acquired.

In Hyper-V, ballooning is initiated by the Dynamic Memory Balancer, a task hosted inside the Root partition’s Virtual Machine Management Server (VMMS) component. Whenever the Dynamic Memory Balancer decides to adjust the amount of guest physical memory allotted to a guest machine, it communicates with the specific VM worker process running in the Root partition that maintains the state of the guest machine. If the decision is to remove memory, the VM worker process issues a message to request page removal that is communicated to the child partition across the VMBus.

The memory ballooning process used to reduce the size of guest physical memory is depicted in Figure 15. Inside the child partition, the Dynamic Memory VSC – also responsible for implementing the guest OS enlightenment that reports the number of guest OS committed bytes – responds to the remove memory request by making a call to the MmAllocatePagesForMdlEx API, which acquires memory from the non-paged pool. This pool of allocated physical memory, normally used by drivers for DMA devices that need access to physical addresses, is the “balloon” that inflates when Hyper-V determines it is appropriate to remove guest physical memory from the guest machine. The Dynamic Memory VSC then returns to the Root partition – via another VMBus message – a list of the Guest Physical addresses of the balloon pages that it has just acquired. The Root partition then signals the hypervisor that these pages are available to be added to a different partition.

Figure 15. The balloon driver is a Dynamic Memory VSC that responds to a VMBus request to remove memory by acquiring memory from the non-paged pool. The balloon driver then returns a list of physical memory pages that the hypervisor can immediately grant to a different virtual machine.

Since the balloon driver in the guest machine will pin the memory balloon pages in nonpaged physical memory until further notice, the physical memory pages in the guest machine balloon prove the exception to the rule that memory locations can only be occupied by one guest machine at a time. The pages in the balloon are set aside, remaining accessible from inside the guest machine; however, the balloon driver ensures that they are not actually accessed. This allows Hyper-V to grant the machine memory these balloon pages occupy to another guest machine to use. 

From inside the guest Windows machine, the balloon inflating increases the amount of nonpaged pool memory that is allocated, as illustrated in Figure 16. Figure 16 reports on the size of the nonpaged Pool in a Windows guest during a period when the balloon inflates (shortly after 5 pm) and then deflates about an hour later. 

Figure 16. Inside the guest Windows machine, the balloon inflating corresponds to an increase in the amount of nonpaged pool memory that is allocated. In this example, the balloon deflates about 1 hour later.
As in VMware, ballooning itself has no guaranteed immediate impact on physical memory contention inside the Windows guest machine. So long as the guest machine has a sufficient supply of available pages, the impact remains minimal. Over time, however, ballooning can pin enough guest OS pages in physical memory to force the guest machine to execute its page replacement policy. In the case of Windows, this means that the OS will also issue a LowMemoryResourceNotification event, which triggers garbage collection in a .NET Framework application and a similar buffer manager trimming operation in SQL Server. On the other hand, if ballooning does not cause the guest OS machine to experience memory contention, i.e., if the balloon request can be satisfied without triggering the guest machine’s page replacement policy, there will be no visible impact inside the guest machine other than an increase in the size of the nonpaged Pool.

Next: a Dynamic Memory management case study: what happens when the machine memory on the Hyper-V Host is over-committed.


Popular posts from this blog

Inside the Windows Runtime, Part 2

As I mentioned in the previous post, run-time libraries in Windows provide services for applications running in User mode. For historical reasons, this run-time layer in Windows was always known as the Win32 libraries, even when these services are requested in the 64-bit OS in 32-bit mode. A good example of a Win32 run-time service is any operation that involves opening and accessing a file somewhere in the file system (or the network, or the cloud). A more involved example is the set of Win32 services an application needs to access to play an audio file, including understanding the specific audio file compressed format, and checking authorization and security.
For Windows 8, a portion of the existing Win32 services in Windows were ported to the ARM hardware platform.  The scope of the Win32 API is huge, and it was probably not feasible to convert all of it during the span of a single, time-constrained release cycle. Unfortunately, the fact that the new Windows 8 Runtime library encomp…

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed in the previous post, there are few professional-grade, application response time monitoring and profiling tools that exploit the …

Why is my web app running slowly? -- Part 1.

This series of blog posts picks up on a topic I made mention of earlier, namely scalability models, where I wrote about how implicit models of application scalability often impact the kinds of performance tests that are devised to evaluate the performance of an application. As discussed in that earlier blog post, sometimes the influence of the underlying scalability model is subtle, often because the scalability model itself is implicit. In the context of performance testing, my experience is that it can be very useful to render the application’s performance and scalability model explicitly. At the very least, making your assumptions explicit opens them to scrutiny, allowing questions to be asked about their validity, for example.
The example I used in that earlier discussion was the scalability model implicit when employing stress test tools like HP LoadRunner and Soasta CloudTest against a web-based application. Load testing by successively increasing the arrival rate of customer r…