When Pre-Loading Beats Streaming: The Caching Advantage
Introduction
Web pages are often composed of:
- semi-static page parts which change infrequently and which are the same for all users,
- dynamic page parts which change frequently and whose content may depend on the user.
Recently released JavaScript frameworks, and the pioneering JS framework Marko, optimize page loading by progressively streaming different parts of the page to the client. Although this kind of streaming is an effective optimization, I argue in this article that we should consider it only after taking care of a more important optimization: Caching.
In my article, How to make fast web frontends, I classified optimization techniques broadly into two categories:
- Techniques which speed up page loading by reducing the work necessary, for the server and for the network, to deliver the content to the client. Of these techniques, Caching is a prime example.
- Techniques which do not necessarily reduce work but instead reduce user wait time by scheduling resource loading intelligently. Both HTTP-streaming and Pre-Loading fall into this category.
- With HTTP Response Streaming, the server can start loading all page parts in parallel as soon as it receives the client’s request.
- With Pre-Loading, the client can start fetching page parts as soon as it receives the page’s headers or its head element.
Although relying on HTTP response streaming allows the server to start fetching resources earlier than pre-loading, it is not a perfect solution: Streaming all page parts together is a form of bundling and therefore hinders caching: The highly cacheable semi-static page parts cannot benefit from the HTTP cache as they are bundled with the dynamic parts in a single URL.
In this article, I compare the performance of page loading when the full page is streamed to the client versus when dynamic page parts are pre-loaded as separate resources. Using diagrams generated by simulation, I show that both Full-Page Streaming and Split-Page Pre-Loading can achieve similar performance with the latter being more effective at reducing overall work thanks to better compatibility with caching.
Table of contents
- Introduction
- Table of contents
- Simulation Settings
- The baseline pages to compare: Full-Page Streaming vs Split-Page with Pre-Loading
- The effect of server-side and edge caching
- Assembling the Full-Page version on the edge for better caching
- Comparing page loads for returning users with a warm cache
- Conclusion
Simulation settings
For the remainder of the article, I’ll be showing timeline charts (or Gantt charts) of the page loading of multiple versions of the same web page. These charts were generated by simulating the client, the server, the network, and the database holding the page data.
The page of interest is composed of 2 parts: One semi-static part which is cacheable, and a dynamic part which is not. The page also loads a script file. The page is considered completely loaded once both page parts are loaded, the script is loaded and executed, and both page parts are hydrated (if needed).
The simulation uses the parameters:
| Parameter | Value |
|---|---|
| Database Query Duration | 50 milliseconds |
| Database Query Response Size | 25 KB |
| Render Data To HTML Duration | 50 milliseconds |
| Render From HTML Duration | 50 milliseconds |
| Render From JSON Duration | 100 milliseconds |
| Execute Script Duration | 250 milliseconds |
| Hydration Duration | 50 milliseconds |
| Request Size | 250 Bytes |
| Head Size | 1 KB |
| Semi-Static HTML Part Size | 25 KB |
| Dynamic HTML Part Size | 25 KB |
| Script Size | 250 KB |
| Dynamic JSON Data Size | 25 KB |
| Client To Server Network Latency | 200 milliseconds |
| Client To Server Network Bandwidth | 2.5 MB/s |
| Client To Edge Network Latency | 50 milliseconds |
| Client To Edge Network Bandwidth | 2.5 MB/s |
| Edge To Server Network Latency | 150 milliseconds |
| Edge To Server Network Bandwidth | 10 MB/s |
Additionally, the simulation assumes that:
- The server and the database are completely idle when the request arrives.
- The server and the database are single-threaded (processing requests one at a time).
- The network is completely un-congested.
- The full network bandwidth is instantly available (no slow starting).
- No additional latency is created by HTTPS handshakes
- The page’s script is async and non-render-blocking
You can generate timeline charts with different parameters by visiting the simulation playground.
The baseline pages to compare: Full-Page Streaming vs Split-Page with Pre-Loading
Let’s see the page loading timeline diagrams for two versions of our web page:
- The first version streams all page content in response to a single URL
full-page. - The second version delivers the semi-static page part in response to the URL
split-page, and then the dynamic part as a response todynamic-page-data.json.
In the first round of simulation, which omits caching, both full-page and split-page achieved identical First Contentful Paint latency, with the former completely loading 50ms earlier than the latter.
Full-Page Streaming version: Notice how the server sends requests to get both the semi-static and the dynamic page parts as soon as it receives the full-page request (at T=200ms). The page is fully loaded at T=1249ms.
Split-Page with Pre-loading version: When the server receives the request for the page, it only fetches the semi-static page part at T=200ms. As for the dynamic page part, it is fetched by a separate client request which the server starts processing at T=600ms (400ms later than in the streamed full-page version). That said, the full page loading finishes only 50ms later in this particular example (at T=1309ms).
Thanks to pre-loading, the client requests the dynamic page part as soon as it receives the page's head element (at T=400ms - twice the network's client to server latency). Without pre-loading, the dynamic page part wouldn't be requested by the client until the script is loaded and executed, which delays full page loading until T=1718ms.
The effect of server-side and edge caching
Now let’s add caching at two levels:
- The server caches the semi-static page part, so it does not need to reach the database to get this part.
- An edge node, or a CDN point of presence, is placed between the client and the server, serving cacheable resources without reaching the origin server.
Both the full-page and the split-page versions benefit from caching on the server and the edge. Their page load times improved by 290ms and 610ms respectively.
The split-page version benefited more from caching. Compared to the full-page version, it got a 300ms earlier First Contentful Paint and a 260ms earlier page load.
Full-Page Streaming with server and edge caching: Thanks to server-side caching of the semi-static page part, the First Contentful Paint arrives earlier than without caching (at T=461ms instead of T=569ms). And thanks to the edge, the script file is loaded with reduced latency. The page loads fully at T=959ms.
Preloaded Split-Page with server and edge caching: The semi-static page part is delivered from the edge with very reduced latency, leading to a First Contentful Paint as soon as T=160ms (300ms earlier than the full-page version) and a full page load at T=699ms (260ms faster than the full-page version, and 610ms faster than the split-page version without caching).
Pre-loading significantly impacts performance. Without it, the split-page takes 421ms longer to fully load (T=1120ms) even with caching enabled.
Assembling the Full-Page version on the edge for better caching
As we saw in the previous section, the streamed full-page version cannot take full advantage of edge caching: each page request must reach the origin server, which re-sends the otherwise cacheable semi-static page part to the client every time.
It is possible to address this problem with edge-side page assembly, which involves caching semi-static parts at the edge and streaming them to the client as dynamic parts are fetched from the origin server.
- This approach has existed since the early 2000s with Edge Side Includes (ESI).
- Facebook implemented a similar approach, called BigPipe in 2010.
- More recently, Next.js implemented this pattern with Partial Pre-rendering (PPR) available on Vercel.
Edge-side page assembly has some drawbacks:
- It requires framework support and often vendor-specific code
- It requires edge processing, which may incur additional costs
- The origin server no longer repeatedly sends semi-static parts to the edge. However, the edge still sends them to returning clients on each request.
Full-Page Streaming with edge page assembly: Thanks to edge-side page assembly, the semi-static page part is now delivered from the edge, leading to a First Contentful Paint at T=160ms and a full page load at T=699ms (identical to the split-page version with edge caching example).
Page loading from a returning user with warm client cache
Lastly, let’s examine a less representative but still interesting scenario: how fast each page loads for returning users who have fresh cached resources in their browser cache.
Full-Page Streaming for a returning user: The client requests full-page which is not cacheable, and receives it very rapidly (because of edge-side page assembly from the previous section). First Contentful Paint is delayed until T=400ms because the script started executing before the page's semi-static part arrived. The page is fully loaded at T=621ms.
Pre-loaded Split-Page for a returning user: The semi-static page part is immediately available from the browser cache. Thanks to this, the First Contentful Paint arrives at T=50ms (350ms earlier than the full-page version) and the page is fully loaded at T=571ms (50ms earlier).
Conclusion
In this article, we explored the performance trade-offs between two approaches for loading mixed semi-static and dynamic web pages:
- Full-Page Streaming: Delivering the entire page (both semi-static and dynamic parts) as a single streamed HTTP response.
- Split-Page with Pre-loading: Delivering page parts as separate resources, with the client fetching dynamic parts early via pre-loading.
In our simulation, we observed that:
- With caching, Split-Page with Pre-Loading achieves faster load times than Full-Page Streaming, because it separates cacheable content from dynamic content, allowing each to be cached independently.
- Full-page Streaming needs edge-side processing to achieve similar caching benefits, adding cost and complexity.
Recent JavaScript frameworks make Full-Page Streaming easy, which works well for dynamic content but hinders edge and browser caching of semi-static content. Split-Page with Pre-loading avoids this problem and can be implemented without framework support. That said, framework support is needed to use this pattern alongside DX-enhancing features like server functions/actions in various frameworks, which abstract away endpoint creation and invocation. Among mainstream JS frameworks, Astro’s server islands implement the Split-Page approach with arguably the best developer experience.