Skip to main content

Webpagetest and measuring visual completeness -- why is this web app running slowly, Part 8.

This is a continuation of a series of blog entries on this topic. The series starts here.

In this post, I discuss one more performance tool that belongs in the YSlow family. It is called WebPageTest, and it makes a credible attempt at addressing some of the limitations of the YSlow approach I have been writing about. The WebPageTest tool does measure Page Load time, and offers several other useful goodies, including a revised set of performance optimization rules to consider. WebPageTest builds on the legacy of YSlow, while also trying to address some of the limitations of the original approach with today’s JavaScript-enhanced web applications. 

Focusing as the original YSlow approach does on the web page composition process performed inside the browser, one of YSlow’s legacies is that it spawned the development of web application performance tools that measure the page load time, addressing one of the important limitations of the original YSlow approach. Browser-based tools like Chrome’s PageSpeed do measure page load time and visualize the page composition process as a waterfall diagram unfolding over time. The browser-based approach still faces the limitation that it can only measure page load time from a single vantage point on the network, which is hardly representative of actual network latencies. 

Another limitation of the browser-based approach is that the traditionally Page Load time measurement, which is signaled by the firing of the DOM’s window.load event, may no longer represent a clear, unambiguous boundary for the response time event. As web pages perform more and more JavaScript manipulation of the DOM during execution of the window.load event handler, this boundary can be blurred beyond recognition in extreme cases. It is another case of evolving web technologies requiring new measurement approaches.

Figure 10 is a screen shot from a WebPageTest timing test that I performed on the Amazon Home page. There is so much going on in that rich display of performance data – partly because WebPageTest provides so much information about page load time and partly because the Amazon home page itself is so complex – that a single screen shot hardly does justice to the tool. You can, however, access this example directly using the following hyperlink to the performance test results for the Amazon page: Better yet, you can sign up to use the site, which is free of charge, and run the performance test against your own web site.

Figure 10. The waterfall view of the page composition process in the tool, which measures the performance of the home page in this example.

Across the top of the Test Result page, the WebPageTest detail report proudly shows its YSlow heritage by computing a “PageSpeed” score and a set of letter grades, similar to YSlow, based on the web page conforming to a set of mostly familiar performance rules, including the usual recommendations to minify Http objects and compress Response messages, enable caching, and defer parsing of JavaScript, etc. You may also notice some distinctly new rules reflecting current trends in more complex web page composition, like prioritizing visible “above the fold” content. The “above-the-fold” terminology is borrowed from grahic design principles of printed (and folded) newspaper layout, where the goal, back in the heyday of the printed press, was to capture Readers’ eyeballs from the content visible on the front page above the fold. The web page load focuses on content that first becomes visible without recourse to scrolling, ignoring for a moment the complication that while the fold was a physical artifact of the manner in which printed newspapers are displayed at a newsstand, the extent of the web page content that is first visible in a browser window is a function of both the physical display size, which is permanent, and the current size of the browser window, which is subject to change.

In addition, WebPageTest attempts to provide a set of more representative and realistic network latencies to measure by being able to host your test at a range of different locations scattered around the globe. After selecting one of the available locations for the performance test, WebPageTest builds a waterfall view of the page composition process that is similar to PageSpeed, as illustrated in Figure 10, where I again used the home page for an example. The WebPageTest view even incorporates YSlow-like performance rules and letter grades. In this example, the Amazon web page initially took about 5 seconds to load, but then continued to load content for another 6 seconds or so after the onload event fired and the window.load event handler began to execute. 

In an attempt to address the second measurement issue, WebPageTest considers the web page to be visually complete after 13.7 seconds, a measurement that is based on comparing successive video images until a final, static video is captured. The “visually complete” measurement is intended to augment page load time measurements, particularly useful when, as in this example, the window.load event handler performs extensive and time-consuming DOM manipulation.

Another innovative aspect of the WebPageTest toolset that is innovative is its UI enhancements to the waterfall diagram used to show the page composition process over time. Figure 10 illustrates how the delay associated with each GET Request is decomposed into the consecutive stages required to resolve the Request. If any DNS lookup is required, TCP protocol connection hand-shaking, or SSL negotiation, these delays are indicated visually. In addition, the Time to First Byte is also distinguished visually from any subsequent packets that may have been required to send the full Response message. (These are distinguished in the tool as a delay associated with “content download.”) 

Visually, the WebPageTest also adds an overlay line to indicate in the waterfall diagram when the window.load event fired, which gives you a way to assess how much JavaScript processing of the DOM occurs after the event that signals that the Page is loaded fires. If major DOM manipulation is deferred to the window.load event handler, the WebPageTest waterfall diagram reflects that, in effect, directing you to weigh using the Visually Complete measurement instead of Page Load as the better available performance indicator.

Next: Real User Measurements using the NavTiming API


Popular posts from this blog

“There’s a lot more to running a starship than answering a lot of fool questions.”

Continuing a series of blog posts on “expert” computer Performance rules, I am reminded of something Captain James T. Kirk, commander of the starship Enterprise, once said in an old Star Trek episode: “There’s a lot more to running a starship than answering a lot of fool questions.” Star Trek, The Original Series. Episode: The Deadly Years. Season 2, Episode 12. See For some reason, the idea that the rote application of some set of rules derived by a domain “expert” can suffice in computer performance analysis has great sway. At the risk of beating a dead horse, I want to highlight another example of a performance Rule you are likely to face, and, in the process, discuss why there is a whole lot more to applying it than might be obvious at first glance. There happens to be a lot more to computer performance analysis than the rote evaluation of some set of well-formed performance rules. It ought to be apparent by now that I …

How Windows performance counters are affected by running under VMware ESX

This post is a prequel to a recent one on correcting the Process(*)\% Processor Time counters on a Windows guest machine.

To assess the overall impact of the VMware virtualization environment on the accuracy of the performance measurements available for Windows guest machines, it is necessary to first understand how VMware affects the clocks and timers that are available on the guest machine. Basically, VMware virtualizes all calls made from the guest OS to hardware-based clock and timer services on the VMware Host. A VMware white paper entitled “Timekeeping in VMware Virtual Machines” contains an extended discussion of the clock and timer distortions that occur in Windows guest machines when there are virtual machine scheduling delays. These clock and timer services distortions, in turn, cause distortion among a considerably large set of Windows performance counters, depending on the specific type of performance counter. (The different types of performance counters are described here

Virtual memory management in VMware: memory ballooning

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.

Ballooning is a complicated topic, so bear with me if this post is much longer than the previous ones in this series.

As described earlier, VMware installs a balloon driver inside the guest OS and signals the driver to begin to “inflate” when it begins to encounter contention for machine memory, defined as the amount of free machine memory available for new guest machine allocation requests dropping below 6%. In the benchmark example I am discussing here, the Memory Usage counter rose to 98% allocation levels and remained there for duration of the test while all four virtual guest machines were active.

Figure 7, which shows the guest machine Memory Granted counter for each guest, with an overlay showing the value of the Memory State counter reported at the end of each one-minute measurement interval, should help to clarify the state of VMware memory-managemen…