Skip to main content

Measuring Web Page Load time: why is this web app running slowly, Part 6.

This is a continuation of a series of blog entries on this topic. The series starts here.

In this post, I will discuss three approaches to measuring actual web page load times, something which is quite important for a variety of reasons, some of which I have already discussed. Measurements of web page load time capture service level measurements from the standpoint of the customer. Service level measurements also enable performance analysts to use decomposition techniques, breaking down page load time into its components: browser render time, network transmission delays, IP domain name lookup, TCP session connection, etc.

The first approach to measuring web page load times was built on top of network packet capture technology, which was already capable of capturing the network packets associated with HTTP GET and POST Requests and their associated Response Messages. Packet tracing is associated with network sniffers like WireShark and Netmon that play a huge role in the data center in diagnosing network connectivity and performance problems. By adding the ability to understand and interpret requests made using the application-level HTTP protocol, packet-tracing tools like WireShark could be extended to report Http application-oriented response time measurements. 

One obvious limitation of this approach is that the network capture needs to occur somewhere inside the data center, and that data-center orientation is limited whenever web applications reference content that is outside the data center, including content that is cached in a CDN or references a third-party ad server. A second limitation is that network packet tracing captures a huge volume of trace data, which then must be filtered extensively down to just the relevant HTTP traffic. A third limitation is the ability of the browser to open and transmit requests on multiple TCP sessions, a capability that makes it more difficult to stitch together all the HTTP Requests associated with a single page load into a coherent view of the web application. These limitations are not serious ones when measurement tools based on network packet tracing are used in the development and testing, but they are serious constraints when you need to use them to monitor a large scale, production environment. 

A second approach was pioneered by vendors who saw an opportunity to address the limitations of network packet tracing by measuring web application response times at their source, namely, from the point of view of the web client. Measurements that are taken at the web client are known as end-to-end (often abbreviated as ETE) response time measurements. A simple, low-tech way to go about measuring web application response times from the vantage point of the web client is to gather them manually, using a stopwatch, for example, to mark the Request begin and end times. Now if you can imagine a hardware and software solution that can automate the measurement process, you have the necessary ingredients for building an end-to-end measurement tool. Such a solution would simulate customer activity by generating synthetic requests issuing from the vendor’s data centers to your web site and measuring the end-to-end response times that resulted – a form of automated performance testing. In the process, these performance tools can also assess web site availability, which would include notification in the event of an outage.

The vendors who built the first end-to-end monitoring solutions moved quickly to extend their monitoring operations to use multiple sites, distributing these monitoring locations around the globe in order to incorporate more representative network latencies in the end-to-end measurements they could gather. Once issuing web application requests from locations is factored in the equation, measuring end-to-end response of these synthetic requests gains the advantage that it incorporates network latencies that are more representative of actual customer experiences, compared to performance tests that are performed from inside the data center. The vendors’ synthetic testing package typically offers service level reporting and exception reports detailing requests that did not meet their service level goals for both availability and responsiveness.

An obvious concern when you are relying on this approach is that the synthetic workload must be representative of the actual workload in order for the measurements that are captured to be useful, the exact same issue that performance engineers who design and run automated web site stress tests also struggle to address. There is also the related problem of what experienced QA professionals call test coverage, where the range of synthetic requests issued does not encompass enough of the surface area of the application, keeping the data center in the dark when too many important “edge cases” remain un-instrumented, while the developers remain just as much in the dark as ever about which specific scenarios lead to long-running requests.

The third and most recent approach gathers measurement data on Page Load time from inside the web browser. This approach is known as Real User Measurements, or RUM, to distinguish it from the synthetic request approach. With RUM, you are assured of complete coverage since all customer requests can be measured, keeping in mind that “all” can be a very large number. The RUM approach also faces some substantial technical hurdles. One serious technical issue is how to get measurement data from the web browser session on a customer’s remote computer or mobile device somewhere in the world back to the data center for analysis. In the Google Analytics approach to RUM, the measurements taken by the browser are sent to a ginormous Google data center using web beacons where the measurements are analyzed and reported. 

Another obstacle in the RUM approach is the volume of measurement data that can result. While the amount of RUM measurement data you need to collect is far less that the volume of network packet trace records that must be sifted through to understand HTTP application response times, it is still potentially quite a large amount of measurement data, given an active web site. Sampling is one alternative for high volume web sites to consider. By default, Google Analytics samples the response time measurements, which helps considerably with the volume of network traffic and back-end processing. And, since Google provides the resources at its data center to process all the web beacon measurements, it is Google shouldering that burden, not your data center, which assumes your organization approves of Google having access to all this measurement data about your web site to begin with. Naturally, third party vendors like New Relic have moved into this space where they gather and analyze this measurement data for you and report back directly to you, guaranteeing that this web site tracking data will never reach unfriendly eyes.

A final aspect of the RUM approach that I will want to focus some attention on is the Navigation/Timing standard that was adopted by the World Wide Web Consortium (W3C), the standards body responsible for the HTTP protocol. The Navigation/Timing specification that the major web browsers have all adopted provides a standard method for gathering RUM measurements independent of what web browser or device your web site visitor is using. Prior to the Navigation/Timing API, gathering RUM measurements was a little complicated because of differences among the individual web browsers. However, as the Navigation/Timing API was adopted by the major web browsers, eliminating most of the complications involved in gathering RUM data from your customers’ sessions with your web site.

In the next post, I will drill deeper into some of the performance tools that take the first approach, namely, measuring web application performance by analyzing HTTP network traffic.


Comments

Popular posts from this blog

How Windows performance counters are affected by running under VMware ESX

This post is a prequel to a recent one on correcting the Process(*)\% Processor Time counters on a Windows guest machine.

To assess the overall impact of the VMware virtualization environment on the accuracy of the performance measurements available for Windows guest machines, it is necessary to first understand how VMware affects the clocks and timers that are available on the guest machine. Basically, VMware virtualizes all calls made from the guest OS to hardware-based clock and timer services on the VMware Host. A VMware white paper entitled “Timekeeping in VMware Virtual Machines” contains an extended discussion of the clock and timer distortions that occur in Windows guest machines when there are virtual machine scheduling delays. These clock and timer services distortions, in turn, cause distortion among a considerably large set of Windows performance counters, depending on the specific type of performance counter. (The different types of performance counters are described here

“There’s a lot more to running a starship than answering a lot of fool questions.”

Continuing a series of blog posts on “expert” computer Performance rules, I am reminded of something Captain James T. Kirk, commander of the starship Enterprise, once said in an old Star Trek episode: “There’s a lot more to running a starship than answering a lot of fool questions.” Star Trek, The Original Series. Episode: The Deadly Years. Season 2, Episode 12. See http://tos.trekcore.com/episodes/season2/2x12/captioninglog.txt. For some reason, the idea that the rote application of some set of rules derived by a domain “expert” can suffice in computer performance analysis has great sway. At the risk of beating a dead horse, I want to highlight another example of a performance Rule you are likely to face, and, in the process, discuss why there is a whole lot more to applying it than might be obvious at first glance. There happens to be a lot more to computer performance analysis than the rote evaluation of some set of well-formed performance rules. It ought to be apparent by now that I …

Virtual memory management in VMware: memory ballooning

This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.


Ballooning
Ballooning is a complicated topic, so bear with me if this post is much longer than the previous ones in this series.

As described earlier, VMware installs a balloon driver inside the guest OS and signals the driver to begin to “inflate” when it begins to encounter contention for machine memory, defined as the amount of free machine memory available for new guest machine allocation requests dropping below 6%. In the benchmark example I am discussing here, the Memory Usage counter rose to 98% allocation levels and remained there for duration of the test while all four virtual guest machines were active.

Figure 7, which shows the guest machine Memory Granted counter for each guest, with an overlay showing the value of the Memory State counter reported at the end of each one-minute measurement interval, should help to clarify the state of VMware memory-managemen…