Skip to main content

Virtual memory management in VMware: Swapping

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.


Swapping

VMware has recourse to steal physical memory pages granted to a guest OS at random, which VMware terms swapping, to relieve a serious shortage of machine memory. When free machine memory drops below a 4% threshold, swapping is triggered. 

During the case study, VMware resorted to swapping beginning around 9:10 AM when the Memory State variable reported a memory state transition to the “Hard” memory state, as shown in Figure 19. Initially, VMware swapped out almost 600 MB of machine memory granted to the four guest machines. Also, note that swapping is very biased. The ESXAS12B guest machine was barely touched, while at one point 400 MB of machine memory from the ESXAS12E machine was swapped out.
Figure 19. VMware resorted to random page replacement – or swapping – to relieve a critical shortage of machine memory when usage of machine memory exceeded 96%. Swapping was biased – not all guest machines were penalized equally.

Given how infrequently random page replacement policies are implemented, it is surprising to discover they often perform reasonably well in simulations, although they still perform much worse than stack algorithms that order candidates for page replacements based on Least Recently Used criteria. Because VMware selects pages from a guest machine’s allotted machine memory for swapping at random, it is entirely possible for VMware to remove truly awful candidates from the current working set of a guest machine’s machine memory pages using swapping. With random page replacement, some worst case scenarios are entirely possible. For example VMware might to choose to swap out a frequently referenced page that contains code from the operating system kernel or Page Table entries, pages that the guest OS would be among the pages least likely to be chosen for page replacement.

To see how effective VMware’s random page replacement policy is, the rate of pages swapped out were compared to the swap-in rate. This comparison is shown in Figure 20. There were two large bursts of swap out activity, the first one taking place at 9:10 AM when the swap out rate was reported at about 8 MB/sec. The swap-in rate never exceeded 1 MB/sec, but a small amount of swap-in activity continued to be necessary over the next 90 minutes of the benchmark run, until the guest machines were shut down and machine memory was no longer over-committed. In clustered VMware environments, the vMotion facility can be invoked automatically to migrate a guest machine from an over-committed ESX Host to another machine in the cluster that is not currently experiencing memory contention. This action may relieve the immediate memory over-commitment, but may also succeed in simply shifting the problem to another VM Host.

As noted in the previous blog entry, the benchmark program took three times longer to execute when there was memory contention from all four active guest machines, compared to running in a standalone guest machine. Delays due to VMware swapping were certainly one of the important factors contributing to elongated program run-times.

Figure 20. Comparing pages swapped out to pages swapped in.
This entry on VMware swapping concludes the presentation of the results of the case study that stressed the virtual memory management facilities of an VMware ESX host machine. Based on an analysis of the performance data on memory usage gathered at the level of both the VMware Host and internally in the Windows guest machines, it was possible to observe the virtual memory management mechanisms used by VMware in operation very clearly. 

With this clearer understanding of VMware memory management in mind, I'll discuss some of the broader implications for performance and capacity planning of large scale virtualized computing infrastrucures in the next (and last) post in this series.

Comments

Popular posts from this blog

Hyper-V Architecture: Intercepts, interrupts and Hypercalls

Intercepts, interrupts and Hypercalls Three interfaces exist that allow for interaction and communication between the hypervisor, the Root partition and the guest partitions: intercepts, interrupts, and the direct Hypercall interface. These interfaces are necessary for the virtualization scheme to function properly, and their usage accounts for much of the overhead virtualization adds to the system. Hyper-V measures and reports on the rate these different interfaces are used, which is, of course, workload dependent. Frankly, the measurements that show the rate that the hypervisor processes interrupts and Hypercalls is seldom of interest outside the Microsoft developers working on Hyper-V performance itself. But these measurements do provide insight into the Hyper-V architecture and can help us understand how the performance of the applications running on guest machines is impacted due to virtualization. Figure 3 is a graph showing these three major sources of virtualization overhead...

Memory Ballooning in Hyper-V

The previous post in this series discussed the various Hyper-V Dynamic Memory configuration options. Ballooning Removing memory from a guest machine while it is running is a bit more complicated than adding memory to it, which makes use of a hardware interface that the Windows OS supports. One factor that makes removing memory from a guest machine difficult is that the Hyper-V hypervisor does not gather the kind of memory usage data that would enable it to select guest machine pages that are good candidates for removal. The hypervisor’s virtual memory capabilities are limited to maintaining the second level page tables needed to translate Guest Virtual addresses to valid machine memory addresses. Because the hypervisor does not maintain any memory usage information that could be used, for example, to identify which of a guest machine’s physical memory pages have been accessed recently, when Guest Physical memory needs to be removed from a partition, it uses ballooning, which transfe...

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed  in the previous post , there are few professional-grade, application response time monitoring and profi...