Skip to main content

Analyzing HTTP network traffic: Why is this web app running slowly, Part 7.

This is a continuation of a series of blog entries on this topic. The series starts here.

Since HTTP is a wire protocol built on top of TCP/IP, the network packet sniffer technology that is widely used in network diagnostics and performance optimization is readily adapted to measuring web browser Page Load time. Network sniffers like WireShark can intercept and capture all the HTTP traffic and are typically configured to gather related network events, such as DNS look-ups. It is easy to get overwhelmed with all the information that these network diagnostic tools provide. Often software developers prefer network tools that are more focused on the HTTP protocol and the page composition process associated with assembling the DOM and rendering it in the browser. The Developer Tools that ship with the major web browsers include performance tools that measure Page Load time and help you diagnose why your page is slow to load. These tools work by analyzing the network packets sent and received in reply by the web client.

Many web application developers prefer the developer tools for diagnosing performance problems in web applications that can be found in Google’s Chrome, even though many authorities think recent versions of Microsoft’s Internet Explorer are keeping pace. Now that Steve Souder is working in the team that develops the Chrome developer tools, Chrome has also added a tool called PageSpeed, which is functionally very similar to the original YSlow application. If you are using Chrome as a browser, you can navigate to the Developer tools by clicking the “Customize and control Chrome” button on the right hand edge of the Chrome menu bar. Then select the “Tools” menu and then click “Developer Tools.” PageSpeed is one of the menu options on Chrome's Developer Tools menu.

Let’s take a look at both PageSpeed and the network-oriented performance tool from the suite of Developer Tools that come with Chrome, using the US Amazon home at www.amazon.com page as an example. We will also get to see the degree to which Amazon.com, the most popular e-commerce site in the US (according to http://www.httparchive.org/viewsite.php?u=http%3A//www.amazon.com/&pageid=16085988#waterfall, embraces the YSLow performance rules.

Figure 6 shows a view of the Chrome PageSpeed tool after I requested an analysis of the Amazon.com home page. The first observation is that PageSpeed does not espouse the simple YSlow prime directive to “Make fewer HTTP requests.” This is a huge change in philosophy since Souder originally developed YSlow, of course, reflecting some of the concerns I mentioned earlier with how well the original YSlow scalability model reflects the reality of modern web applications.



Figure 6. Using Chrome’s PageSpeed tool to understand why the www.amazon.com home page takes so long to load.

While the rule set might vary a bit, reflecting Souder’s wider and deeper experience in web application performance, but otherwise PageSpeed is identical in operation to YSlow. The Chrome version of the tool re-loads your page and inventories the DOM after the page has been re-built. In the example PageSpeed screenshot, I focused on one of the important and tuningPageSpeed rules that has an identical counterpart in YSlow, namely “minimize the request size”. PageSpeed improves upon the usability of YSlow by clearly identifying the HTTP GET requests that triggered the rule violation. Here PageSpeed reports the 4 requests that generated Response messages that exceeded 1500 bytes and thus required more than one network packet from the web site in response. 

You may also notice that the four Requests shown that violate the “minimum request size” rule are directed at three different web sites, one for a resource available from the main amazon.com web, two from fls.na.amazon.com (likely an internal CDN with responsibility for serving files to customers living in North America), as well as an advertisement served up from doubleclick.net, a company that serves up display ads that is owned by Google. It is common practice for web pages to be composed from content that needs to be assembled from multiple sources, including the advertising that is served up by third party businesses like DoubleClick. While it is fashionable to assert that “information wants to be free,” the harsh reality is that developing and maintaining the enormously complex hardware and software environments that power Internet-based web applications is extremely expensive, and advertising revenue is the fuel that sustains those operations. Advertising revenue from web page viewing (and clicking) has also made Google one of the two or three most profitable Tech industry companies in the world. 

For a glimpse at a performance tool that does report Page Load time measurements directly, click on the “Network” tab on the Chrome Developer tool menu bar, which is shown in Figure 7. The Network view of Page Load time contains an entry for each GET Request that was performed in order to render the full Amazon.com home page and shows the time to complete each operation. You will notice at the bottom of the window, Chrome shows that a total of 315 GET Requests were issued for various image files, style sheets, and JavaScript code files in order to render the Amazon Home page. In this instance, with effective use of the cache to render the Amazon Home page, the browser only took about 4.3 seconds to complete the operation. The overall Page Load time is displayed at the lower left of the window border. (When the browser cache is “cold,” loading the Amazon Home page can easily take one minute or more.)

The Timeline column at far right presents the web page composition process in time sequence, a view that has become known as a waterfall diagram. The Chrome waterfall diagram for Page Load time features a pop-up window that breaks out the time it took to load each individual component of the page. We can see that the initial GET Request to www.amazon.com returns a Response message that is about 78 KB, a payload that has to be broken into more than 50 individual packets. In the pop-up window, we see that the browser waited for 142 milliseconds before the first packet of the HTTP Response message appeared. It then took 1.55 seconds for the remaining 50 or so packets associated with that one Response message to be received. These are measurements derived from monitoring the network traffic that HTTP GET Requests and Response messages initiate.


Figure 7. A waterfall diagram in Chrome PageSpeed showing the page composition process in time sequence for the Amazon home page. The browser issued a total of 315 GET Requests to build the page. In this instance, the page composition process, aided by a warm cache and parallel download sessions, took just 2.35 seconds.


The initial HTTP Response message from the Amazon Home page serves as a kind of exoskeleton that the browser gradually fills in from the Response messages generated by the remaining 314 subsequent GET Requests that are referenced. The HTML standard permits page loading to proceed in parallel, and, as noted above, browsers can generate parallel TCP sessions for loading static content concurrently. In this instance, of course, many of the HTTP objects are available from the cache because I had just loaded the same page immediately prior to running the PageSpeed tool. 

About ten lines down in the example, there is a GET Request for a 23.6 KB image file that required 164 ms to complete. The Timeline column pop-up that breaks out the component load time indicates a separate DNS lookup that took 36 ms and TCP session connection sequence that took 25 ms. This means an embedded Request for a URL that was directed to a different Amazon site. The browser then waited 32 ms for the initial Response message. Finally, it shows a 27.5 ms delay spent in the Receiving state, since the 23.6 KB Response message would require multiple packets. Because the browser supports parallel TCP sessions, this Request does not prevent the browser from initiating other Requests concurrently, however.

Page composition that requires content stored on multiple web sites, parallel browser sessions, JavaScript blocking, and the potential for extensive DOM manipulation when the JavaScript executes are features of web applications that complicate the simple YSlow scalability model derived earlier. Incorporating these additional features into the model yields a more realistic formula:

Page Load time = Browser Render time +
Script execution time +
((Round trips * RTT)/Sessions)
[equation 5]

While equation 5 models the web page composition better, it is still not definitive. We have already discussing some of the added complications, including 
  • RTT is not a constant factor when GET Requests to multiple web sites are issued to compose the complete page and when the impact of the browser cache and a CDN are factored into the equation, 
  • JavaScript download requests cannot proceed in parallel, and
  • JavaScript code executing inside a handler attached to the DOM’s window.load event may further modify the DOM and effectively defer the usability of the page until the script completes, but none of that script execution time is included in the Page Load Time measurements.

Before wrapping up this discussion of the limitations of the YSlow scalability model, let’s return full circle to the original web application (illustrated back in Figure 1 in the very first post in this series) that was running slowly, motivating me to explore using a tool like YSlow to figure out why. Figure 8 shows a Chrome PageSpeed waterfall diagram for the page composition process for the data-rich ASP.NET web application in the case study. (Figure 9 tells a similar story using the comparable Internet Explorer performance tool.)


Figure 8. Waterfall diagram from Chrome PageSpeed showing individual GET Requests and their delays, each of which contributes to the Page composition time. In this example for the case study app, the delay associated with resolving the initial ASP.NET GET Request accounts for about 75% of the overall page load time.

Chrome’s PageSpeed tool indicates that the DailyCharts.aspx web page took about 1.5 seconds to load, requiring a total of 20 GET Requests. These measurements were captured in a single system test environment where the web browser, IIS web server, and the SQL Server backend database were all available on the same machine, so that network latency was minimal. Crucially, generating on the web server a Response message from the original GET Request that runs to 1.5 MB and then transferring that Response message to the web client alone accounted for about ¾ of the overall web application’s Page Load Time. In addition, GET Requests for the two high resolution charts, which are rendered on the server as .jpg files and then referenced in the original Response message, yielded Response messages of about 275 KB and 100 KB in size.

Moreover, since Chrome PageSpeed re-executed the web page, the database queries that were executed on the web server benefitted substantially from a warm start in the SQL Server cache. When I subsequently investigated how long these database queries could take, I noted that a query like the one illustrated that interrogates voluminous Process level counter data stored in the repository accessed several hundred thousand rows of data, that were then subject to sorting to select a return a result set containing the five busiest processes during a 24 period. Without the benefit of a SQL Server cache warm start, database query execution time alone could be on the order of 15-30 seconds.

Figure 9. A similar view of Page Load time using Internet Explorer’s developer tools. In this instance, I also installed an ASP.NET oriented performance tool called glimpse in the web site in order to help diagnose the performance issues. The additional requests associated with Glimpse delay page load time by another 150 ms.

Figure 9 represents another view of the page composition process, this time using Internet Explorer's version of the waterfall timing diagram, which again shows a 1.44 MB Response message generated by the ASP.NET app in response to the initial ASP.NET web GET Request. Internet Explorer reports that it required 1.38 seconds to generate this Response message and transmit it to the web client. (Note that that in this test environment, the web client, the IIS server and the SQL Server back-end database all reside on the same machine, so network latency is minimal – I measured it at less than 10 microseconds.) 
The initial GET Request Response message contains href links to the high resolution charts that were rendered on the web server as .jpgs. Resolving these links for a 235 KB main chart and a 84 KB secondary chart also impact page load time, but these file requests are able to proceed in parallel, at least.

In both Figure 8 and 9, resolving the initial ASP.NET GET Request clearly dominates the page composition process. These waterfall views of the web page composition process for this web application place the YSlow recommendations for improving page load time performance, illustrated back in Figure 2, in a radically different perspective. Instead of worrying about making fewer Http GET Requests, I needed to focus on why the server-side processing to generate a Response message was taking so long. In addition, I also wanted to understand why the Response message associated with the GET Request was so large, requiring almost 1.5 MB of content to be transferred from the web server to the web client, and what steps could be taken to trim it down in size. Unfortunately, the web application performance tools like YSlow are basically silent on the subject of the scalability of any of your server-side components. These need to be investigated using performance tools that run on the web server.

Ultimately, for the case study, I instrumented the ASP.NET application using Scenario.Begin() and Scenario.End() method calls, which allowed me to measure how much time was being spent calling the back-end database to resolve the GET Request query. I wound up re-writing the SQL to generate the same result set using a fraction of the time. Since the database access logic was isolated in a Business Objects layer, that was a relatively straightforward fix that I was able to slip into the next maintenance release. But that quick fix still left me wondering why the initial Response messages were so large, which I investigated by using the ASP.NET page trace diagnostics facility to examine the amount of ViewState data being passed to build the various Machine and Chart selection menus. One of the menu items referenced “All Machines” for every Windows machine defined in the repository, which was a red flag right there. Addressing those aspects of the server-side application required a significant re-engineering of the application, however, which was work that I completed last year.

To conclude my discussion of the case study app, I found that the YSlow-oriented performance tools did highlight the need for me to understand why the server-side processing associated with generating the initial Response message was taking so long and also spurred me to investigate why such large Response messages were being generated. The specific grades received from applying the YSlow performance rules to DOM were not particularly helpful, however. To resolve the performance issues that I found required using traditional server-side application performance tools, including the SQL Explain facility for understanding the database queries and the ASP.NET diagnostic trace that showed me the bloated contents of the ViewState data that is transmitted with the page to persist HTML controls between post back requests. (A post back request is a subsequent GET Request to the web server for the same dynamic HTML web page.) It turned that much of the ViewState data embedded in the initial Response message generated by the the ASP.NET app was supporting the page's menu controls.)

In practice, any web application that generates dynamic HTML that requires ample amounts of server-side resources to build those Response messages faces scalability issues with its server-side components – the database back-end, the business logic layer, or the web server front end. It should be evident from the discussion so far that performance tools like YSlow that execute on the web client and focus on the page composition process associated with the DOM are silent on any scalability concerns that may arise on the web server.


Comments

Popular posts from this blog

Hyper-V Architecture: Intercepts, interrupts and Hypercalls

Intercepts, interrupts and Hypercalls Three interfaces exist that allow for interaction and communication between the hypervisor, the Root partition and the guest partitions: intercepts, interrupts, and the direct Hypercall interface. These interfaces are necessary for the virtualization scheme to function properly, and their usage accounts for much of the overhead virtualization adds to the system. Hyper-V measures and reports on the rate these different interfaces are used, which is, of course, workload dependent. Frankly, the measurements that show the rate that the hypervisor processes interrupts and Hypercalls is seldom of interest outside the Microsoft developers working on Hyper-V performance itself. But these measurements do provide insight into the Hyper-V architecture and can help us understand how the performance of the applications running on guest machines is impacted due to virtualization. Figure 3 is a graph showing these three major sources of virtualization overhead...

Memory Ballooning in Hyper-V

The previous post in this series discussed the various Hyper-V Dynamic Memory configuration options. Ballooning Removing memory from a guest machine while it is running is a bit more complicated than adding memory to it, which makes use of a hardware interface that the Windows OS supports. One factor that makes removing memory from a guest machine difficult is that the Hyper-V hypervisor does not gather the kind of memory usage data that would enable it to select guest machine pages that are good candidates for removal. The hypervisor’s virtual memory capabilities are limited to maintaining the second level page tables needed to translate Guest Virtual addresses to valid machine memory addresses. Because the hypervisor does not maintain any memory usage information that could be used, for example, to identify which of a guest machine’s physical memory pages have been accessed recently, when Guest Physical memory needs to be removed from a partition, it uses ballooning, which transfe...

High Resolution Clocks and Timers for Performance Measurement in Windows.

Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrumenting application requests, transactions and other vital interaction scenarios in order to measure their response times. There is no single, more important performance measurement than application response time, especially in the degree which the consistency and length of application response time events reflect the user experience and relate to customer satisfaction. All the esoteric measurements of hardware utilization that Perfmon revels in pale by comparison. Of course, performance engineers usually still want to be able to break down application response time into its component parts, one of which is CPU usage. Other than the Concurrency Visualizer that is packaged with the Visual Studio Profiler that was discussed  in the previous post , there are few professional-grade, application response time monitoring and profi...