Skip to main content

Why is this web app running slowly? -- Part 3.

This is a continuation of a series of blog entries on this topic. The series starts here.

In this post, I will begin to probe a little deeper into the model of web page response time that is implicit in the YSlow approach to web application performance. 

The YSlow performance rules, along with Souder’s authoritative book that describes the YSlow corpus of recommendations and remedies in more detail, have proved to be remarkably influential among the current generation of web application developers and performance engineers. The scalability model that is implicit in the YSlow performance rules is essentially a prediction about the time it takes to compose a page, given the underlying mechanics of web page composition that entails gathering and assembling all of the files associated with that page. One benefit of the model is that it yields an optimization strategy for constructing web pages that will load more efficiently. The YSlow performance rules embody the tactics recommended for achieving that goal, encapsulating knowledge of both the

HTTP protocol that allows a page to be assembled from multiple parts, and
the underlying TCP/IP networking protocol that governs the mechanism in 

which file transfers are broken into individual packets for actual transmission across the network. 

In this post, I will explore the model, the optimization strategy it yields, and some of the more important tactical recommendations the tool makes in order to design web pages that load fast. Any analytic method that relies on an underlying predictive model runs the risk of generating inaccurate predictions when the model fails to correspond closely enough to the real behavior it attempts to encapsulate. With that in mind, I will also describe some of the deficiencies in the YSlow model

One notable concern that arises whenever you do attempt to apply the YSlow optimization rules to your web app is that the YSlow tool does not attempt to measure the actual time it takes to load the page being analyzed. The effect of the missing measurements is that very specific YSlow recommendations for improving page load time exist in a vague context, lacking precisely the measurements that allow you to assess how much of an improvement can be expected from following that recommendation. This lack of measurement data is a serious, but not necessarily always fatal, shortcoming of the YSlow approach. 

Today, a very popular alternative to YSlow is available at http://www.WebPageTest.org which does provide page load time measurements. Since WebPageTest adopts a very similar scalability model, I will begin with a discussion of the original YSlow tool.


As noted above, YSlow does not attempt to measure the amount of time it takes to transmit the HTTP data files requested over the network. Such measurements are problematic because network latency is mainly a function of the distance between the web client and server host machines. Network delay generally varies based on the geographical location of the client relative to the web server, although the specific route IP packets take to arrive at the server endpoint from the client is also significant. 

Note: in the IP protocol, a route between a source IP address and destination IP address is generated dynamically for each packet. The IP packet forwarding mechanism uses routers that forward each packet one step closer to the desired destination until it ultimately reaches it. Each intermediate endpoint represents one hop. The physical distance between two IP endpoints in a hop is the primary determinant of the latency. (An IP command line utility called tracert, which is included with Windows, attempts to discover and document the path that a packet will travel from your computer in order to arrive at a specific destination IP address. The output from the tracert command shows each network hop and the latency associated with that hop. There are also visually oriented versions of the tracert tool (see, for example, http://www.visualiptrace.com/), some of which even attempt to locate each network hop on a map to indicate visually the physical distance between each hop.)

Without going into too many of the details regarding IP packet routing and TCP session management here, it is generally safe to assume, as YSlow does, that network latency has a significant impact on overall page load times, which is true for any public-facing web site. In the case of running a tool like YSlow, where you are only able to gather one instance of network latency, that singleton measurement can hardly be representative of what is in reality a very variegated landscape of network connectivity across this interconnected world of ours. Naturally, measurement tools have also evolved to fill this important gap, too. For example, the WebPageTest program is hosted at a dozen or so locations around the world, so it is possible to at least compare the performance of your web application across several different geographically-distributed locations.
Even though YSlow’s prediction about page load time is rendered qualitatively – in the form of a letter grade, since YSlow does not attempt to measure actual page load times – it will help to express the model of web application performance underlying the YSlow recommendation engine in quantitative terms. 

In formal terms, in YSlow,

PageLoadTime ≈ Web Page Composition and Render time

[equation 1]

In effect, the YSlow model assumes that the network latency to move request and response messages back and forth across the public Internet in order to compose web pages dominates the cost of computation on the local device where that page composition is performed inside the web client. In reality, the assembly of all the files the page references and render time are distinct phases,

PageLoadTime ≈ Web Page Composition + Render time

[equation 2]

but page render time is even more problematic to measure, given the widespread use of JavaScript code to respond to customer input gestures, etc. YSlow assumes that the web client processing time in assembling the different components of a page is minimal compared to the cost of gathering those elements of the page across the network from a web server located on some distant Host machine. It is often quite reasonable to ignore the amount of time spent inside the web client to perform the actual rendering, as YSlow assumes. Each HTTP GET Request needed to compose the page requires a round trip to the web server and back. Computers being quite fast, while data transmission over long distances is relatively slow, the cost of computation inside the web browser can often be ignored, except in extreme cases. 

To be sure, this simplifying assumption is not always the safest assumption. For example, as powerful as they are becoming, the microprocessors inside mobile devices such as phones and tablets are still significantly slower than desktop PCs and laptop computers. When any significant amount of compute-bound logic is off-loaded from the web server to JavaScript code executing inside the web client, the amount of computing capacity available to the web client can easily become a critical factor in page load time.

Nevertheless, by ignoring the processing time inside the web client, YSlow can assume that page composition and render time is a largely function of the network delays associated with fetching the necessary HTTP objects:

PageLoadTime ≈ RoundTrips* RTT
[equation 3]

namely, the number of round trips to the web server and back multiplied by the Round Trip Time (RTT) for each individual network packet transmission associated with those HTTP objects. 

A simple way to measure the RTT for a web server connection is to use the ping command to send a message to the IP address registered to the domain name and receive a reply. (You can use the Whois service to find the IP address if you only know the web site domain name.) Using ping or a network packet sniffer like WireShark, it is easy to determine that the network transmission round trip time from your client PC to some commercial web site is 100-200 ms or more. On a typical desktop or portable PC, which contains an extremely powerful graphics co-processor, the amount of CPU time needed to render a moderately complex web page is likely to be far less than 200 ms. 

We have seen that large HTTP Request and Response messages require multiple packets to be transmitted. Therefore, the total number of round trips to the web server that are required can be calculated using the following formula:

RoundTrips = SUM(HttpObjectSize / PacketSize)
[equation 4]

since each HTTP object requested requires one or more packets to be transmitted, depending on the size of the object. Ignoring for a moment some web page composition wrinkles that will complicate this simple picture considerably, this simple model does yield a reasonable, first-order approximation of page load time for many web applications. In emphasizing the assembly process which involves retrieving all the HTTP objects that are referenced in the DOM over a TCP/IP network in order to compose a web page, the YSlow model of performance certainly clarifies several of the most important factors that determine the performance of web applications. There are excellent reasons to turn to a tool like YSlow for expert advice about why your web application is running slowly.

If your web application is actually performing badly in either a stress test or in an actual production environment, there are also a number of good reasons to be wary of some of the YSlow recommendations. For one, in the interest of simplification, the YSlow performance rules ignore a number of other potential complicating factors that, depending on the particular web application, could be having a major impact on web page responsiveness. 

When it comes to making changes to your web application based on its YSlow grade, the other important issue is that the YSlow recommendations do not reflect actual Page Load time measurements for your web page. Using the YSlow tool alone, you are not able to quantify how much improvement in page load time to expect from implementing any of the recommended changes. Since not every recommended change is an easy change to implement, YSlow cannot help you understand the benefit of the change so that you weigh it against the cost of implementing the change. 

For instance, sure, you could combine all your scripts into one minified script, but the current way that individual blocks of script code are packaged may provide benefits in terms of application maintainability, stability and flexibility. Your web page may use scripts from multiple sources, perhaps different teams within your own development organization contribute some and others are pulled from external sources, including third parties, for example, that build and maintain popular, open source JavaScript libraries that complement JQuery or AngularJS. From a software engineering quality perspective, packing all these scripts from different sources into one may not be the right Build option to use. 

Knowing how much of an improvement can be gained from implementing the YSlow rule about packing some or all of your script files helps you understand whether this is a packaging trade-off that is worthwhile, given that the change will complicate software maintenance and possibly compromise software quality. For the record, a web page that is flat out broken is always categorically worse than one that is just merely slow, so there is a negative impact from any engineering change that reduces the application’s quality or makes it harder to maintain. Because of the need to understand trade-offs like this, the actual page load time measurements are very important to consider in weighing the benefits of some performance optimization. This has sparked development of performance tools to complement YSlow, which we will take a closer look at in a moment.

Something else to consider when you are thinking of making changes to your web application to conform to the YSlow performance recommendations is why so many apparently successful web applications appear to ignore these recommendations almost completely. The Google search engine landing page at www.google.com, for example, has historically drunk the YSlow Kool-Aid by providing an extremely simple web page that requires only 2 or 3 round trips to load. At the other end of the spectrum in terms of complexity, you will find landing pages for Amazon.com, the largest and most successful e-commerce web site in the world. The last time I ran YSlow against a www.amazon.com landing page, I found it required retrieval of over 300 different HTTP objects to compose the Amazon web page DOM. 

Despite violating the basic YSlow performance rules, it is difficult to argue against Amazon’s approach, given Amazon’s experience and acumen in the world of e-commerce. In fact, you will find that many, many commercial web sites that participate in e-commerce or serve as portals serve up very complex web pages, garnering YSlow grades more like Amazon on the YSlow performance rules than the Google Search page. Companies with web-based retailing operations like Amazon use an experimental approach to web page design where design alternatives are evaluated using live customers to see which approach yields the best results. If Amazon and other e-commerce retailers wind up serving up complex web pages that flaunt the YSlow performance rules, it is safe to conclude that Amazon customers not only tolerate page load times that routinely exceed 5-10 seconds, in some sense they prefer these complex pages, perhaps because the page conveniently and neatly encapsulates all the relevant information they require prior to making a purchasing decision.

On the other hand, in a future post we will look at some of the evidence that improving web page response times is correlated with increased customer satisfaction (which is admittedly difficult to measure) and improved fulfillment rates (which is often much easier to measure). 

Deriving the YSlow model of web application performance, depicted in Equations 3 & 4, leads directly to an optimization strategy to minimize the number of round trips, decrease round trip time, or both. More on that topic next time.

Comments

Popular posts from this blog

Monitoring SQL Server: the OS Wait stats DMV

This is the 2nd post in a series on SQL Server performance monitoring, emphasizing the use of key Dynamic Management View. The series starts here : OS Waits  The consensus among SQL Server performance experts is that the best place to start looking for performance problems is the OS Wait stats from the sys.dm_os_wait_stats DMV. Whenever it is running, the SQL Server database Engine dispatches worker threads from a queue of ready tasks that it services in a round-robin fashion. (There is evidently some ordering of the queue based on priority –background tasks with lower priority that defer to foreground tasks with higher priority.) The engine records the specific wait reason for each task waiting for service in the queue and also accumulates the Wait Time (in milliseconds) for each Wait reason. These Waits and Wait Time statistics accumulate at the database level and reported via the sys.dm_os_wait_stats DMV. Issuing a Query like the following on one of my SQL Server test mac

Memory Ballooning in Hyper-V

The previous post in this series discussed the various Hyper-V Dynamic Memory configuration options. Ballooning Removing memory from a guest machine while it is running is a bit more complicated than adding memory to it, which makes use of a hardware interface that the Windows OS supports. One factor that makes removing memory from a guest machine difficult is that the Hyper-V hypervisor does not gather the kind of memory usage data that would enable it to select guest machine pages that are good candidates for removal. The hypervisor’s virtual memory capabilities are limited to maintaining the second level page tables needed to translate Guest Virtual addresses to valid machine memory addresses. Because the hypervisor does not maintain any memory usage information that could be used, for example, to identify which of a guest machine’s physical memory pages have been accessed recently, when Guest Physical memory needs to be removed from a partition, it uses ballooning, which transfe

Hyper-V Architecture: Intercepts, interrupts and Hypercalls

Intercepts, interrupts and Hypercalls Three interfaces exist that allow for interaction and communication between the hypervisor, the Root partition and the guest partitions: intercepts, interrupts, and the direct Hypercall interface. These interfaces are necessary for the virtualization scheme to function properly, and their usage accounts for much of the overhead virtualization adds to the system. Hyper-V measures and reports on the rate these different interfaces are used, which is, of course, workload dependent. Frankly, the measurements that show the rate that the hypervisor processes interrupts and Hypercalls is seldom of interest outside the Microsoft developers working on Hyper-V performance itself. But these measurements do provide insight into the Hyper-V architecture and can help us understand how the performance of the applications running on guest machines is impacted due to virtualization. Figure 3 is a graph showing these three major sources of virtualization overhead