Skip to main content

Hyper-V Memory Management: Introduction

Hyper-V Memory Management


The hypervisor also contains a Memory Manager component for managing access to the machine’s physical memory, i.e., RAM. For the sake of clarity, when discussing memory management in the Hyper-V environment, I will call RAM machine memory, the Hyper-V host machine’s actual physical memory, to distinguish it from the view of virtualized physical memory granted to each partition. Guest machines never access machine memory directly. Each guest machine is presented with a range of Guest Physical memory addresses (GPA), based on its configuration definitions, that the hypervisor maps to machine memory with a set of page tables that the hypervisor maintains.

Machine memory cannot be shared in the same way that other computer resources like CPUs and disks can be shared. Once memory is in use, it remains 100% occupied until the owner of those memory locations frees it. The hypervisor’s Memory Manager is responsible for distributing machine memory among the root and child partitions. It can partition memory statically, or it can manage the allocation of memory to partitions dynamically. In this section, we will focus on the dynamic memory management capabilities of Hyper-V, an extremely valuable option from the standpoint of capacity planning and provisioning. Dynamic Memory, as the feature is known, enables Hyper-V to host considerably more guest machines, so long as these guest machines are not actively using all the Guest Physical Memory they are eligible to acquire.

The unit of memory management is the hardware page, a fixed-size block of contiguous memory addresses. Windows supports standard 4K pages on Intel hardware and also uses some Large 2 MB pages in specific areas where it is appropriate. Hyper-V supports allocation using both page sizes. Pages of machine memory are either (1) allocated and in use by a child partition or (2) free and available for allocation on demand as needed. 

Each guest machine assumes that the physical memory it is assigned is machine memory, and builds its own unique set of Guest Virtual Addresses (GVA) to Guest Physical addresses mappings – its own set of page tables. Both sets of page tables are referenced by the hardware during virtual address translation when a guest machine is running. As mentioned above, this hardware capability is known as Second Level Address Translation (SLAT). SLAT hardware makes virtualization much more efficient. Figure 8 illustrates the capability of SLAT hardware to reference both the hypervisor Page Tables that map machine memory and the guest machine’s Page Tables that map Guest Virtual Addresses to Guest Physical addresses during virtual address translation. 

Figure 8. Second Level Address Translation (SLAT) hardware and the tagged TLB are hardware optimizations that improve the performance of virtual machines.
Figure 8 illustrates another key hardware feature called tagged TLB that was specifically added to the Intel architecture to improve the performance of virtual machines. The Translation Lookaside Buffer (TLB) is a small, dedicated cache internal to the processor core containing the addresses of recently accessed virtual addresses and the corresponding machine memory addresses they are mapped to. In the processor hardware, virtual addresses are translated to machine memory addresses during instruction execution, and TLBs are extremely effective at speeding up that process. With virtualization hardware, each entry in the processor’s TLB is tagged with a virtual machine guest ID, as illustrated, so when the hypervisor Scheduler dispatches a new virtual machine, the TLB entries associated with the previously executing virtual machine can be identified and purged from the table. 

Memory management for the Root partition is handled a little differently from the child partitions. 
The Root partition requires access to machine memory addresses and other physical hardware on the motherboard like the APIC to allow the Windows OS running in the Root partition to manage physical devices like the keyboard, mouse, video display, storage peripherals, and the network adaptor. But the Root partition is also a Windows machine that is capable of running Windows applications, so it builds page tables for mapping virtual addresses to physical memory addresses like a native version of the OS. In the case of the Root partition’s page tables, unlike any of the child partitions, physical addresses in the Root partition correspond directly to machine memory addresses. This allows the Root OS to access memory mapped for use by the video card and video driver, for example, as well as the physical memory accessed by other DMA device drivers. In addition, the hypervisor reserves some machine memory locations exclusively for its own use, which is the only machine memory that is off limits to the Root partition.

From a capacity planning perspective, it is important to remember that the Root partition requires some amount of Guest Physical Memory, too. You can see how much physical memory the Root is currently using by looking at the usual OS Memory performance counters.


Popular posts from this blog

Inside the Windows Runtime, Part 2

As I mentioned in the previous post, run-time libraries in Windows provide services for applications running in User mode. For historical reasons, this run-time layer in Windows was always known as the Win32 libraries, even when these services are requested in the 64-bit OS in 32-bit mode. A good example of a Win32 run-time service is any operation that involves opening and accessing a file somewhere in the file system (or the network, or the cloud). A more involved example is the set of Win32 services an application needs to access to play an audio file, including understanding the specific audio file compressed format, and checking authorization and security.
For Windows 8, a portion of the existing Win32 services in Windows were ported to the ARM hardware platform.  The scope of the Win32 API is huge, and it was probably not feasible to convert all of it during the span of a single, time-constrained release cycle. Unfortunately, the fact that the new Windows 8 Runtime library encomp…

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed in the previous post, there are few professional-grade, application response time monitoring and profiling tools that exploit the …

Why is my web app running slowly? -- Part 1.

This series of blog posts picks up on a topic I made mention of earlier, namely scalability models, where I wrote about how implicit models of application scalability often impact the kinds of performance tests that are devised to evaluate the performance of an application. As discussed in that earlier blog post, sometimes the influence of the underlying scalability model is subtle, often because the scalability model itself is implicit. In the context of performance testing, my experience is that it can be very useful to render the application’s performance and scalability model explicitly. At the very least, making your assumptions explicit opens them to scrutiny, allowing questions to be asked about their validity, for example.
The example I used in that earlier discussion was the scalability model implicit when employing stress test tools like HP LoadRunner and Soasta CloudTest against a web-based application. Load testing by successively increasing the arrival rate of customer r…