Skip to main content

Performance Management in the Virtual Data Center: Virtual Memory Management, Part 1

This is the beginning a new series of blog posts that explores the strategies that VMware ESX employs to manage machine memory, focusing on the ones that are designed to support aggressive consolidation of virtual machine guests on server hardware. Server consolidation is one of the prime cost justifications for the use of VMware’s virtualization technology. Typical rack-mounted blade servers that are deployed in data centers contain far more processing power than most application servers require. From a capacity planning perspective, it is simply not cost-effective to configure many server images today to run directly on native hardware.

Virtualization software permits server resources – CPUs, memory, disk and network – to be carved up into functional sub-units and then shared among multiple tenants, known as guest machines. Aggregating multiple server images onto blade servers using virtualization provides compelling operational benefits, including rapid recovery from failures because it is so quick and easy to spin up a new guest machine using VMware. With current generation processor, disk and networking hardware that was designed with virtualization in mind, guest machine performance approaches the performance of the same applications running on native hardware, but only so long as the virtualization Host itself is not overloaded. If the virtualization host is not adequately provisioned, however, performance issues will arise due to contention for those shared resources.

Diagnosing performance problems in the virtualization environment can, unfortunately, be quite complicated. This is partly due to the fact that the configuration itself can be quite complicated, especially when a typical VMware Host is managing many guest machines. In addition, there are often many VMware Hosts interconnected to a shared disk IO farm and/or networking fabric. When any of this shared hardware infrastructure becomes overloaded and performance suffers, the task of sorting out the root cause of this problem can prove quite daunting.

The focus of this series is on the impact of sharing physical memory, or RAM. To support aggressive server consolidation, the VMware Host grants physical memory to guest machines on demand. By design, VMware allows physical memory to be over-committed, where the overall amount of virtualized physical memory granted to guest machines exceeds the amount of actual machine memory that is available. VMware also looks for opportunities for guest machines to share hardware memory pages when the contents of any two (or more) pages are identical. Identical guest machine pages, once identified, are mapped to a single, common page in RAM.

The outline for this series of blog posts is as follows. I begin with a brief introduction to virtual memory management concepts. This is pretty much a basic review of the topic and the terminology. If it is an area that you already understood well, you should feel comfortable skipping over it.

Next, I discuss the specific approach to virtual memory management used in VMware. In this section, I will stick to information on virtual memory management that is available from published VMware sources. Much of the existing documentation is, unfortunately, very sketchy.

Finally, I will analyze a case study of VMware under stress. The case study vividly illustrates what happens when the VMware hypervisor confronts a configuration of guest machines that demands access to more physical memory addresses than are available on the underlying hardware configuration.

The case study analyzed here proved very instructive. It provides an opportunity to observe the effectiveness of the strategies VMware employs to manage virtual memory and the potential impact of those strategies on the performance of the underlying applications running on virtualized hardware whenever there is significant contention for RAM.

If you are ready to start reading, the first part of this series of blog posts is here.


Popular posts from this blog

Inside the Windows Runtime, Part 2

As I mentioned in the previous post, run-time libraries in Windows provide services for applications running in User mode. For historical reasons, this run-time layer in Windows was always known as the Win32 libraries, even when these services are requested in the 64-bit OS in 32-bit mode. A good example of a Win32 run-time service is any operation that involves opening and accessing a file somewhere in the file system (or the network, or the cloud). A more involved example is the set of Win32 services an application needs to access to play an audio file, including understanding the specific audio file compressed format, and checking authorization and security.
For Windows 8, a portion of the existing Win32 services in Windows were ported to the ARM hardware platform.  The scope of the Win32 API is huge, and it was probably not feasible to convert all of it during the span of a single, time-constrained release cycle. Unfortunately, the fact that the new Windows 8 Runtime library encomp…

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed in the previous post, there are few professional-grade, application response time monitoring and profiling tools that exploit the …

Why is my web app running slowly? -- Part 1.

This series of blog posts picks up on a topic I made mention of earlier, namely scalability models, where I wrote about how implicit models of application scalability often impact the kinds of performance tests that are devised to evaluate the performance of an application. As discussed in that earlier blog post, sometimes the influence of the underlying scalability model is subtle, often because the scalability model itself is implicit. In the context of performance testing, my experience is that it can be very useful to render the application’s performance and scalability model explicitly. At the very least, making your assumptions explicit opens them to scrutiny, allowing questions to be asked about their validity, for example.
The example I used in that earlier discussion was the scalability model implicit when employing stress test tools like HP LoadRunner and Soasta CloudTest against a web-based application. Load testing by successively increasing the arrival rate of customer r…