Skip to main content

Virtual memory management in VMware: a case study

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.

Case Study.

The case study reported here is based on a benchmark using a simulated workload that generates contention for machine memory. VMware ESX Server software was installed on a Dell Optiplex 990 with an Intel i7 quad-core processor and 16GB of RAM. (Hyper-Threading was disabled on the processor through the BIOS.) Then four identical Windows Server 2012 guest machines were configured, each configured to run with 8 GB of physical memory. Each Windows guest is running a 64-bit application benchmark program designated as the ThreadContentionGenerator.exe, which the author developed.

The benchmark program was written using the .NET Framework. The program allocates a very large block of private memory and accesses that memory randomly. The benchmark program is multi-threaded, and updates the allocated array using explicit locking to maintain the integrity of its internal data structures. Executing threads also simulate IO waits periodically by going to sleep, instead of executing reads or writes against the file system to avoid exercising the machine’s physical disks. Performance data from both VMware and the Windows guest machines was gathered at one minute intervals for the duration of the benchmark testing, approximately 2 hours. For comparison purposes, a single guest machine was activated to execute the same benchmark in a standalone environment where there was no contention for machine memory. Running standalone, with no memory contention, the benchmark executes in about 30 minutes.


Memory allocation on demand

Figure 2 tracks three key ESX memory performance metrics during the test: the total Memory Granted to the four guest Windows machines, the total Memory Active for the same four guest Windows machines, and the VMware Host’s Memory Usage counter, reported as a percent of the total machine memory available. The total Memory Granted counter increases at the outset in 8 GB steps as each of the Windows guest machines spins up. The memory benchmarking programs were started at just before 9 AM, and continued executing over the next 2 hour period, finally winding down execution near 11 AM. The benchmark programs drive Active Memory to almost 15 GB, shortly after 9 AM, and overall Memory Usage to 98%. (In this configuration, 2% free memory translates into about 300 MB of available physical memory.) 

Notice that the Memory Active counter that purports to measure the guest OS working set of resident pages exhibits some anomalies, presumably associated with the way it is estimated using sampling. There are periodic spikes in the counter when the guest machines have just been activated, but are not active yet in the beginning of the testing period. Toward the end of the benchmark period, after many of the benchmark worker threads have completed, there is another spike, resembling the earlier ones. This later spike shows total guest machine active memory briefly reaching some 20 GB, which, of course, is physically impossible.


Figure 2. Memory Granted, Memory Active and % Memory Used during the benchmark.

As the benchmark programs execute in each of the guest machines, the Memory Granted counter takes a downward plunge from 32 GB down to about 15 GB. The vCenter Performance Counters documentation provides this definition of the counter: “The amount of memory that was granted to the VM by the host. Memory is not granted to the [guest] until it is touched one time and granted memory may be swapped out or ballooned away if the VMkernel needs the memory.” Evidently, during initialization of a Windows Server machine, the OS initially touches every page in physical memory, so initially 8 GB of RAM are granted to each guest machine. But in this case study, there is only 16 GB total physical RAM available. As VMware detects memory contention, the memory granted to each guest machine is evidently reduced through page replacement, using the ballooning and swapping mechanisms.

Figure 3 attempts to show the breakdown of machine memory allocated by adding allotments associated with the VMKernel to the sum of the Active memory consumed by each of the guest machines. The anomalous spike in Active Memory near the end of the benchmark test pushes overall machine memory usage beyond the amount of RAM actually installed, which, as noted above, is physically impossible. This measurement anomaly, possibly associated with a systematic sampling error, is troubling because it makes it difficult in VMware to obtain a precise breakdown of machine memory allocation and usage reliably.

Figure 3. Machine memory allocations, including the areas of memory allocated by the VMKernel.
Figure 3 also shows a dotted line overlay that reports the value of the Memory State counter. The Memory State counter reports the value of memory state at the end of each measurement interval, so these values should be interpreted as sample observations. There were three sample observations when the memory state was “Soft,” indicating ballooning taking place. And there is an earlier sample observation where the memory state was “Hard,” indicating that swapping was triggered. 

Figure 4 shows the same counter data as Figure 3, without the Memory Active counter data. We see that the VMware Host management functions consume about 1.5 GB of RAM altogether. This includes the Memory Overhead counter, which reports the space the shadow page tables occupy. The amount of machine memory that the VMware hypervisor consumes remains flat through out the active benchmarking period.


Figure 4. Machine memory areas allocated by the VMKernel, including memory management “overhead.”

In the next post in this series, we will look at the effectiveness of another VMware memory management feature, transparent memory sharing. 

Comments

Popular posts from this blog

“There’s a lot more to running a starship than answering a lot of fool questions.”

Continuing a series of blog posts on “expert” computer Performance rules, I am reminded of something Captain James T. Kirk, commander of the starship Enterprise, once said in an old Star Trek episode: “There’s a lot more to running a starship than answering a lot of fool questions.” Star Trek, The Original Series. Episode: The Deadly Years. Season 2, Episode 12. See http://tos.trekcore.com/episodes/season2/2x12/captioninglog.txt. For some reason, the idea that the rote application of some set of rules derived by a domain “expert” can suffice in computer performance analysis has great sway. At the risk of beating a dead horse, I want to highlight another example of a performance Rule you are likely to face, and, in the process, discuss why there is a whole lot more to applying it than might be obvious at first glance. There happens to be a lot more to computer performance analysis than the rote evaluation of some set of well-formed performance rules. It ought to be apparent by now that I …

How Windows performance counters are affected by running under VMware ESX

This post is a prequel to a recent one on correcting the Process(*)\% Processor Time counters on a Windows guest machine.

To assess the overall impact of the VMware virtualization environment on the accuracy of the performance measurements available for Windows guest machines, it is necessary to first understand how VMware affects the clocks and timers that are available on the guest machine. Basically, VMware virtualizes all calls made from the guest OS to hardware-based clock and timer services on the VMware Host. A VMware white paper entitled “Timekeeping in VMware Virtual Machines” contains an extended discussion of the clock and timer distortions that occur in Windows guest machines when there are virtual machine scheduling delays. These clock and timer services distortions, in turn, cause distortion among a considerably large set of Windows performance counters, depending on the specific type of performance counter. (The different types of performance counters are described here

Virtual memory management in VMware: memory ballooning

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.


Ballooning
Ballooning is a complicated topic, so bear with me if this post is much longer than the previous ones in this series.

As described earlier, VMware installs a balloon driver inside the guest OS and signals the driver to begin to “inflate” when it begins to encounter contention for machine memory, defined as the amount of free machine memory available for new guest machine allocation requests dropping below 6%. In the benchmark example I am discussing here, the Memory Usage counter rose to 98% allocation levels and remained there for duration of the test while all four virtual guest machines were active.

Figure 7, which shows the guest machine Memory Granted counter for each guest, with an overlay showing the value of the Memory State counter reported at the end of each one-minute measurement interval, should help to clarify the state of VMware memory-managemen…