Virtual memory management in VMware: a case study

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.

Case Study.

The case study reported here is based on a benchmark using a simulated workload that generates contention for machine memory. VMware ESX Server software was installed on a Dell Optiplex 990 with an Intel i7 quad-core processor and 16GB of RAM. (Hyper-Threading was disabled on the processor through the BIOS.) Then four identical Windows Server 2012 guest machines were configured, each configured to run with 8 GB of physical memory. Each Windows guest is running a 64-bit application benchmark program designated as the ThreadContentionGenerator.exe, which the author developed.

The benchmark program was written using the .NET Framework. The program allocates a very large block of private memory and accesses that memory randomly. The benchmark program is multi-threaded, and updates the allocated array using explicit locking to maintain the integrity of its internal data structures. Executing threads also simulate IO waits periodically by going to sleep, instead of executing reads or writes against the file system to avoid exercising the machine’s physical disks. Performance data from both VMware and the Windows guest machines was gathered at one minute intervals for the duration of the benchmark testing, approximately 2 hours. For comparison purposes, a single guest machine was activated to execute the same benchmark in a standalone environment where there was no contention for machine memory. Running standalone, with no memory contention, the benchmark executes in about 30 minutes.

Memory allocation on demand

Figure 2 tracks three key ESX memory performance metrics during the test: the total Memory Granted to the four guest Windows machines, the total Memory Active for the same four guest Windows machines, and the VMware Host’s Memory Usage counter, reported as a percent of the total machine memory available. The total Memory Granted counter increases at the outset in 8 GB steps as each of the Windows guest machines spins up. The memory benchmarking programs were started at just before 9 AM, and continued executing over the next 2 hour period, finally winding down execution near 11 AM. The benchmark programs drive Active Memory to almost 15 GB, shortly after 9 AM, and overall Memory Usage to 98%. (In this configuration, 2% free memory translates into about 300 MB of available physical memory.)

Notice that the Memory Active counter that purports to measure the guest OS working set of resident pages exhibits some anomalies, presumably associated with the way it is estimated using sampling. There are periodic spikes in the counter when the guest machines have just been activated, but are not active yet in the beginning of the testing period. Toward the end of the benchmark period, after many of the benchmark worker threads have completed, there is another spike, resembling the earlier ones. This later spike shows total guest machine active memory briefly reaching some 20 GB, which, of course, is physically impossible.

Figure 2. Memory Granted, Memory Active and % Memory Used during the benchmark.

As the benchmark programs execute in each of the guest machines, the Memory Granted counter takes a downward plunge from 32 GB down to about 15 GB. The vCenter Performance Counters documentation provides this definition of the counter: “The amount of memory that was granted to the VM by the host. Memory is not granted to the [guest] until it is touched one time and granted memory may be swapped out or ballooned away if the VMkernel needs the memory.” Evidently, during initialization of a Windows Server machine, the OS initially touches every page in physical memory, so initially 8 GB of RAM are granted to each guest machine. But in this case study, there is only 16 GB total physical RAM available. As VMware detects memory contention, the memory granted to each guest machine is evidently reduced through page replacement, using the ballooning and swapping mechanisms.

Figure 3 attempts to show the breakdown of machine memory allocated by adding allotments associated with the VMKernel to the sum of the Active memory consumed by each of the guest machines. The anomalous spike in Active Memory near the end of the benchmark test pushes overall machine memory usage beyond the amount of RAM actually installed, which, as noted above, is physically impossible. This measurement anomaly, possibly associated with a systematic sampling error, is troubling because it makes it difficult in VMware to obtain a precise breakdown of machine memory allocation and usage reliably.

Figure 3. Machine memory allocations, including the areas of memory allocated by the VMKernel.

Figure 3 also shows a dotted line overlay that reports the value of the Memory State counter. The Memory State counter reports the value of memory state at the end of each measurement interval, so these values should be interpreted as sample observations. There were three sample observations when the memory state was “Soft,” indicating ballooning taking place. And there is an earlier sample observation where the memory state was “Hard,” indicating that swapping was triggered.

Figure 4 shows the same counter data as Figure 3, without the Memory Active counter data. We see that the VMware Host management functions consume about 1.5 GB of RAM altogether. This includes the Memory Overhead counter, which reports the space the shadow page tables occupy. The amount of machine memory that the VMware hypervisor consumes remains flat through out the active benchmarking period.

Figure 4. Machine memory areas allocated by the VMKernel, including memory management “overhead.”

In the next post in this series, we will look at the effectiveness of another VMware memory management feature, transparent memory sharing.

Performance By Design

Search This Blog