Skip to main content

Performance Management in the Virtual Data Center: Virtual Memory Management, Part 1

This is the beginning a new series of blog posts that explores the strategies that VMware ESX employs to manage machine memory, focusing on the ones that are designed to support aggressive consolidation of virtual machine guests on server hardware. Server consolidation is one of the prime cost justifications for the use of VMware’s virtualization technology. Typical rack-mounted blade servers that are deployed in data centers contain far more processing power than most application servers require. From a capacity planning perspective, it is simply not cost-effective to configure many server images today to run directly on native hardware.

Virtualization software permits server resources – CPUs, memory, disk and network – to be carved up into functional sub-units and then shared among multiple tenants, known as guest machines. Aggregating multiple server images onto blade servers using virtualization provides compelling operational benefits, including rapid recovery from failures because it is so quick and easy to spin up a new guest machine using VMware. With current generation processor, disk and networking hardware that was designed with virtualization in mind, guest machine performance approaches the performance of the same applications running on native hardware, but only so long as the virtualization Host itself is not overloaded. If the virtualization host is not adequately provisioned, however, performance issues will arise due to contention for those shared resources.

Diagnosing performance problems in the virtualization environment can, unfortunately, be quite complicated. This is partly due to the fact that the configuration itself can be quite complicated, especially when a typical VMware Host is managing many guest machines. In addition, there are often many VMware Hosts interconnected to a shared disk IO farm and/or networking fabric. When any of this shared hardware infrastructure becomes overloaded and performance suffers, the task of sorting out the root cause of this problem can prove quite daunting.

The focus of this series is on the impact of sharing physical memory, or RAM. To support aggressive server consolidation, the VMware Host grants physical memory to guest machines on demand. By design, VMware allows physical memory to be over-committed, where the overall amount of virtualized physical memory granted to guest machines exceeds the amount of actual machine memory that is available. VMware also looks for opportunities for guest machines to share hardware memory pages when the contents of any two (or more) pages are identical. Identical guest machine pages, once identified, are mapped to a single, common page in RAM.

The outline for this series of blog posts is as follows. I begin with a brief introduction to virtual memory management concepts. This is pretty much a basic review of the topic and the terminology. If it is an area that you already understood well, you should feel comfortable skipping over it.

Next, I discuss the specific approach to virtual memory management used in VMware. In this section, I will stick to information on virtual memory management that is available from published VMware sources. Much of the existing documentation is, unfortunately, very sketchy.

Finally, I will analyze a case study of VMware under stress. The case study vividly illustrates what happens when the VMware hypervisor confronts a configuration of guest machines that demands access to more physical memory addresses than are available on the underlying hardware configuration.

The case study analyzed here proved very instructive. It provides an opportunity to observe the effectiveness of the strategies VMware employs to manage virtual memory and the potential impact of those strategies on the performance of the underlying applications running on virtualized hardware whenever there is significant contention for RAM.

If you are ready to start reading, the first part of this series of blog posts is here.

Comments

Popular posts from this blog

Hyper-V Architecture: Intercepts, interrupts and Hypercalls

Intercepts, interrupts and Hypercalls Three interfaces exist that allow for interaction and communication between the hypervisor, the Root partition and the guest partitions: intercepts, interrupts, and the direct Hypercall interface. These interfaces are necessary for the virtualization scheme to function properly, and their usage accounts for much of the overhead virtualization adds to the system. Hyper-V measures and reports on the rate these different interfaces are used, which is, of course, workload dependent. Frankly, the measurements that show the rate that the hypervisor processes interrupts and Hypercalls is seldom of interest outside the Microsoft developers working on Hyper-V performance itself. But these measurements do provide insight into the Hyper-V architecture and can help us understand how the performance of the applications running on guest machines is impacted due to virtualization. Figure 3 is a graph showing these three major sources of virtualization overhead...

Memory Ballooning in Hyper-V

The previous post in this series discussed the various Hyper-V Dynamic Memory configuration options. Ballooning Removing memory from a guest machine while it is running is a bit more complicated than adding memory to it, which makes use of a hardware interface that the Windows OS supports. One factor that makes removing memory from a guest machine difficult is that the Hyper-V hypervisor does not gather the kind of memory usage data that would enable it to select guest machine pages that are good candidates for removal. The hypervisor’s virtual memory capabilities are limited to maintaining the second level page tables needed to translate Guest Virtual addresses to valid machine memory addresses. Because the hypervisor does not maintain any memory usage information that could be used, for example, to identify which of a guest machine’s physical memory pages have been accessed recently, when Guest Physical memory needs to be removed from a partition, it uses ballooning, which transfe...

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed  in the previous post , there are few professional-grade, application response time monitoring and profi...