This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.
Given how infrequently random page replacement policies are implemented, it is surprising to discover they often perform reasonably well in simulations, although they still perform much worse than stack algorithms that order candidates for page replacements based on Least Recently Used criteria. Because VMware selects pages from a guest machine’s allotted machine memory for swapping at random, it is entirely possible for VMware to remove truly awful candidates from the current working set of a guest machine’s machine memory pages using swapping. With random page replacement, some worst case scenarios are entirely possible. For example VMware might to choose to swap out a frequently referenced page that contains code from the operating system kernel or Page Table entries, pages that the guest OS would be among the pages least likely to be chosen for page replacement.
To see how effective VMware’s random page replacement policy is, the rate of pages swapped out were compared to the swap-in rate. This comparison is shown in Figure 20. There were two large bursts of swap out activity, the first one taking place at 9:10 AM when the swap out rate was reported at about 8 MB/sec. The swap-in rate never exceeded 1 MB/sec, but a small amount of swap-in activity continued to be necessary over the next 90 minutes of the benchmark run, until the guest machines were shut down and machine memory was no longer over-committed. In clustered VMware environments, the vMotion facility can be invoked automatically to migrate a guest machine from an over-committed ESX Host to another machine in the cluster that is not currently experiencing memory contention. This action may relieve the immediate memory over-commitment, but may also succeed in simply shifting the problem to another VM Host.
As noted in the previous blog entry, the benchmark program took three times longer to execute when there was memory contention from all four active guest machines, compared to running in a standalone guest machine. Delays due to VMware swapping were certainly one of the important factors contributing to elongated program run-times.
This entry on VMware swapping concludes the presentation of the results of the case study that stressed the virtual memory management facilities of an VMware ESX host machine. Based on an analysis of the performance data on memory usage gathered at the level of both the VMware Host and internally in the Windows guest machines, it was possible to observe the virtual memory management mechanisms used by VMware in operation very clearly.
With this clearer understanding of VMware memory management in mind, I'll discuss some of the broader implications for performance and capacity planning of large scale virtualized computing infrastrucures in the next (and last) post in this series.
Swapping
VMware has recourse to steal physical memory pages granted to a guest OS at random, which VMware terms swapping, to relieve a serious shortage of machine memory. When free machine memory drops below a 4% threshold, swapping is triggered.
During the case study, VMware resorted to swapping beginning around 9:10 AM when the Memory State variable reported a memory state transition to the “Hard” memory state, as shown in Figure 19. Initially, VMware swapped out almost 600 MB of machine memory granted to the four guest machines. Also, note that swapping is very biased. The ESXAS12B guest machine was barely touched, while at one point 400 MB of machine memory from the ESXAS12E machine was swapped out.
Given how infrequently random page replacement policies are implemented, it is surprising to discover they often perform reasonably well in simulations, although they still perform much worse than stack algorithms that order candidates for page replacements based on Least Recently Used criteria. Because VMware selects pages from a guest machine’s allotted machine memory for swapping at random, it is entirely possible for VMware to remove truly awful candidates from the current working set of a guest machine’s machine memory pages using swapping. With random page replacement, some worst case scenarios are entirely possible. For example VMware might to choose to swap out a frequently referenced page that contains code from the operating system kernel or Page Table entries, pages that the guest OS would be among the pages least likely to be chosen for page replacement.
To see how effective VMware’s random page replacement policy is, the rate of pages swapped out were compared to the swap-in rate. This comparison is shown in Figure 20. There were two large bursts of swap out activity, the first one taking place at 9:10 AM when the swap out rate was reported at about 8 MB/sec. The swap-in rate never exceeded 1 MB/sec, but a small amount of swap-in activity continued to be necessary over the next 90 minutes of the benchmark run, until the guest machines were shut down and machine memory was no longer over-committed. In clustered VMware environments, the vMotion facility can be invoked automatically to migrate a guest machine from an over-committed ESX Host to another machine in the cluster that is not currently experiencing memory contention. This action may relieve the immediate memory over-commitment, but may also succeed in simply shifting the problem to another VM Host.
As noted in the previous blog entry, the benchmark program took three times longer to execute when there was memory contention from all four active guest machines, compared to running in a standalone guest machine. Delays due to VMware swapping were certainly one of the important factors contributing to elongated program run-times.
Figure 20. Comparing pages swapped out to pages swapped in. |
Comments
Post a Comment