Skip to main content

Measuring Web Page Load time: why is this web app running slowly, Part 6.

This is a continuation of a series of blog entries on this topic. The series starts here.

In this post, I will discuss three approaches to measuring actual web page load times, something which is quite important for a variety of reasons, some of which I have already discussed. Measurements of web page load time capture service level measurements from the standpoint of the customer. Service level measurements also enable performance analysts to use decomposition techniques, breaking down page load time into its components: browser render time, network transmission delays, IP domain name lookup, TCP session connection, etc.

The first approach to measuring web page load times was built on top of network packet capture technology, which was already capable of capturing the network packets associated with HTTP GET and POST Requests and their associated Response Messages. Packet tracing is associated with network sniffers like WireShark and Netmon that play a huge role in the data center in diagnosing network connectivity and performance problems. By adding the ability to understand and interpret requests made using the application-level HTTP protocol, packet-tracing tools like WireShark could be extended to report Http application-oriented response time measurements. 

One obvious limitation of this approach is that the network capture needs to occur somewhere inside the data center, and that data-center orientation is limited whenever web applications reference content that is outside the data center, including content that is cached in a CDN or references a third-party ad server. A second limitation is that network packet tracing captures a huge volume of trace data, which then must be filtered extensively down to just the relevant HTTP traffic. A third limitation is the ability of the browser to open and transmit requests on multiple TCP sessions, a capability that makes it more difficult to stitch together all the HTTP Requests associated with a single page load into a coherent view of the web application. These limitations are not serious ones when measurement tools based on network packet tracing are used in the development and testing, but they are serious constraints when you need to use them to monitor a large scale, production environment. 

A second approach was pioneered by vendors who saw an opportunity to address the limitations of network packet tracing by measuring web application response times at their source, namely, from the point of view of the web client. Measurements that are taken at the web client are known as end-to-end (often abbreviated as ETE) response time measurements. A simple, low-tech way to go about measuring web application response times from the vantage point of the web client is to gather them manually, using a stopwatch, for example, to mark the Request begin and end times. Now if you can imagine a hardware and software solution that can automate the measurement process, you have the necessary ingredients for building an end-to-end measurement tool. Such a solution would simulate customer activity by generating synthetic requests issuing from the vendor’s data centers to your web site and measuring the end-to-end response times that resulted – a form of automated performance testing. In the process, these performance tools can also assess web site availability, which would include notification in the event of an outage.

The vendors who built the first end-to-end monitoring solutions moved quickly to extend their monitoring operations to use multiple sites, distributing these monitoring locations around the globe in order to incorporate more representative network latencies in the end-to-end measurements they could gather. Once issuing web application requests from locations is factored in the equation, measuring end-to-end response of these synthetic requests gains the advantage that it incorporates network latencies that are more representative of actual customer experiences, compared to performance tests that are performed from inside the data center. The vendors’ synthetic testing package typically offers service level reporting and exception reports detailing requests that did not meet their service level goals for both availability and responsiveness.

An obvious concern when you are relying on this approach is that the synthetic workload must be representative of the actual workload in order for the measurements that are captured to be useful, the exact same issue that performance engineers who design and run automated web site stress tests also struggle to address. There is also the related problem of what experienced QA professionals call test coverage, where the range of synthetic requests issued does not encompass enough of the surface area of the application, keeping the data center in the dark when too many important “edge cases” remain un-instrumented, while the developers remain just as much in the dark as ever about which specific scenarios lead to long-running requests.

The third and most recent approach gathers measurement data on Page Load time from inside the web browser. This approach is known as Real User Measurements, or RUM, to distinguish it from the synthetic request approach. With RUM, you are assured of complete coverage since all customer requests can be measured, keeping in mind that “all” can be a very large number. The RUM approach also faces some substantial technical hurdles. One serious technical issue is how to get measurement data from the web browser session on a customer’s remote computer or mobile device somewhere in the world back to the data center for analysis. In the Google Analytics approach to RUM, the measurements taken by the browser are sent to a ginormous Google data center using web beacons where the measurements are analyzed and reported. 

Another obstacle in the RUM approach is the volume of measurement data that can result. While the amount of RUM measurement data you need to collect is far less that the volume of network packet trace records that must be sifted through to understand HTTP application response times, it is still potentially quite a large amount of measurement data, given an active web site. Sampling is one alternative for high volume web sites to consider. By default, Google Analytics samples the response time measurements, which helps considerably with the volume of network traffic and back-end processing. And, since Google provides the resources at its data center to process all the web beacon measurements, it is Google shouldering that burden, not your data center, which assumes your organization approves of Google having access to all this measurement data about your web site to begin with. Naturally, third party vendors like New Relic have moved into this space where they gather and analyze this measurement data for you and report back directly to you, guaranteeing that this web site tracking data will never reach unfriendly eyes.

A final aspect of the RUM approach that I will want to focus some attention on is the Navigation/Timing standard that was adopted by the World Wide Web Consortium (W3C), the standards body responsible for the HTTP protocol. The Navigation/Timing specification that the major web browsers have all adopted provides a standard method for gathering RUM measurements independent of what web browser or device your web site visitor is using. Prior to the Navigation/Timing API, gathering RUM measurements was a little complicated because of differences among the individual web browsers. However, as the Navigation/Timing API was adopted by the major web browsers, eliminating most of the complications involved in gathering RUM data from your customers’ sessions with your web site.

In the next post, I will drill deeper into some of the performance tools that take the first approach, namely, measuring web application performance by analyzing HTTP network traffic.


Comments

Popular posts from this blog

Hyper-V Architecture: Intercepts, interrupts and Hypercalls

Intercepts, interrupts and Hypercalls Three interfaces exist that allow for interaction and communication between the hypervisor, the Root partition and the guest partitions: intercepts, interrupts, and the direct Hypercall interface. These interfaces are necessary for the virtualization scheme to function properly, and their usage accounts for much of the overhead virtualization adds to the system. Hyper-V measures and reports on the rate these different interfaces are used, which is, of course, workload dependent. Frankly, the measurements that show the rate that the hypervisor processes interrupts and Hypercalls is seldom of interest outside the Microsoft developers working on Hyper-V performance itself. But these measurements do provide insight into the Hyper-V architecture and can help us understand how the performance of the applications running on guest machines is impacted due to virtualization. Figure 3 is a graph showing these three major sources of virtualization overhead...

Memory Ballooning in Hyper-V

The previous post in this series discussed the various Hyper-V Dynamic Memory configuration options. Ballooning Removing memory from a guest machine while it is running is a bit more complicated than adding memory to it, which makes use of a hardware interface that the Windows OS supports. One factor that makes removing memory from a guest machine difficult is that the Hyper-V hypervisor does not gather the kind of memory usage data that would enable it to select guest machine pages that are good candidates for removal. The hypervisor’s virtual memory capabilities are limited to maintaining the second level page tables needed to translate Guest Virtual addresses to valid machine memory addresses. Because the hypervisor does not maintain any memory usage information that could be used, for example, to identify which of a guest machine’s physical memory pages have been accessed recently, when Guest Physical memory needs to be removed from a partition, it uses ballooning, which transfe...

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed  in the previous post , there are few professional-grade, application response time monitoring and profi...