Table of content

How to make fast web frontends

Introduction

In this article, I will discuss techniques for optimizing the performance of web frontends. There are many reasons to focus on this:

The article is structured as follows:

Although performance analysis should be based on measurements, I do not cover how to use DevTools to measure performance or which metrics to choose. You can find that information in other articles. Instead, I focus on explaining how various optimizations work and how different optimizations can help or interfere with each other.

Table of content

1. The Web as a physical system

The World Wide Web is a major user of the Internet: a gigantic distributed system of machines and network infrastructure. Like any other physical system, the Internet competes for resources with other human activities and with the natural world more broadly.

The hardware components making the Internet require physical resources and generate waste throughout their entire lifecycle: During mining and manufacturing, during their usage, and finally when they are disposed of.

One measure of the environmental footprint of the Internet is its carbon emissions: estimated around 4% of global carbon emissions, which is comparable to the entire aviation industry (See: Introduction to web sustainability).

Web infrastructure

The Web as a system with Physical components: This figure shows components of the infrastructure of the Web: Servers, users' devices and network relay devices. It shows also that the web is embedded in a natural system. Space and resources are taken from nature to build and run the infrastructure of the Web. This is represented here by a mine, a factory and solar panels.

This figure also highlights that not everybody has access to the same level of service in the web: A rich user has a more powerful machine and is connected to the web via faster network connections compared to less rich users.

Frontends contribution to the web’s environmental footprint

Studies Environmental footprint of the digital world (2019) and Estimating Digital Emissions (2023) estimate that user devices have larger environmental impact than both networks and data centers, and that networks have a greater impact than data centers. This makes sense given the sheer number of user devices, the size of the network infrastructure, and the fact that manufacturing the devices is a big contributor to their environmental impact.

Based on these estimations, it appears that frontend developers have both the power and the responsibility to reduce the environmental impact of the web.

Emissions breakdown of the web's infrastructure

The breakdown of emissions per tier and per lifecycle phase: This figure showcases data from the article Estimating Digital Emissions (2023).

It is estimated that data centers consume 22% of the energy of the web infrastructure, networks consume 24% and user devices dominate the mix consuming 54%.

It is also estimated that both data centers and networks emit 18% greenhouse gases during manufacturing and 82% during operation. As for user devices, they emit 51% greenhouse gases during manufacturing and 49% during operation.

Improving performance by using more powerful hardware

One way to make web applications run faster is by using more powerful hardware:

Relying on hardware upgrades to solve performance issues should be a considered after software optimizations as it can be costly financially and/or from an environmental point of view.

Bigger Web Infrastructure

Expending the Web Infrastructure in size: This figure shows a similar infrastructure to the one from the previous figure. Here, the servers and the network relay devices are more powerful. This is depicted using bigger sizes. The network connections are of bigger capacity. Users devices and connection speed increased a little bit, and the rich user's increased even more.

In order to support the growth of the infrastructure of the Web, more space is taken from nature. This is depicted here with a larger mine, more solar panels and factories, and with the reduced presence of wild life.

The facial expression drawn on the faces of 3 characters in this figure is the same as in the previous one, to point to hedonic adaptation (The shifting of the baseline human expectation) and to Jevons paradox (The nullification of gains from efficiency because of the increase in consumption).

Having brought into attention the physical components that make up the web, let’s explore some frontend optimizations techniques that reduce their workload.


2. Optimizing performance by doing less work

In this chapter, I present techniques that reduce the workload for server machines, user devices, and the network, by shifting work within the system as to:

2.1. Performance through minimalism

Before diving into the technical side of things, it is worth mentioning minimalism as a non technical, or a less technical, solution to make frontends fast.

Bloat is a very well known phenomenon in software in general and in the web in particular. According to the HTTP Archive, as of june 2025, the median desktop web pages loads 2.8 MB of data (5.8 times the size of the median web page from early 2011).

On this subject, I recommend Maciej Cegłowski’s hilarious talk: The Website Obesity Crisis:

Maciej’s modest proposal: your website should not exceed in file size the major works of Russian literature. Anna Karenina, for example, is 1.8 MB

Minimalism is one way to approach this issue of web bloat. It is encouraged in Sustainable Web design and the eco-sufficiency spheres, where people question the usefulness (to the site owner and to the users) of the content delivered by websites and applications. They ask questions like:

Sustainable Web design goes beyond minimalism. It encompasses user experience design, carbon intensity and also the technical solutions addressed in the rest of this article.


2.2. Caching

Caching is a powerful tool for optimizing performance. Instead of repeatedly performing the same task, such as sending identical data to the client over and over or regenerating the same page on the server multiple times, server responses can be stored in caches on both the client and server sides for reuse when requested again.

This approach sacrifices some memory on the client and server to reduce CPU workload on the server and decrease network bandwidth usage.

Caching can be implemented to some extent without users noticing. However, the best performance gains can only be realized by accepting that some users may not see the most recent version of certain data immediately after it is published: Clients are instructed to store and reuse a response until a certain expiration time without contacting the server. During that period, users may not see the latest content on the server.

As a result, caching decisions should not be made by developers alone; input from the product owner is crucial. Both developers and product owners need to understand the level of cache control that web technologies offer and what is acceptable for different types of resources on their websites and applications to implement effective caching.

HTTP caching

To support caching requirements, the HTTP protocol provides a range of standard headers. For example:

Caching can be done both by clients and intermediary servers. Cache-Control headers can mark responses as private, making them cacheable by the end users’ browsers only, or as public making them cacheable by intermediary servers too. This allows both public content and private and/or user-customized content to benefit from caching.

Caching with shared cache
Shared and private caches: In this example, when the server responds to Client 1's request, both the shared cache and Client 1's private cache save the response. When Client 2 requests the same data, the shared cache responds without contacting the origin server, and Client 2's private cache saves the response as well. When both Clients 1 and 2 need the same data again, they can reuse the version saved in their private caches without requiring any network traffic. The origin server generates this piece of data only once.
Fresh and stale cached data

When an HTTP response is stored in a cache, it remains fresh for a certain duration. Once that duration is elapsed, the response becomes stale. If the freshness duration is not specified by a Cache-Control: max-age header or an Expires header, caches will choose it heuristically. Additionally, the freshness duration can be explicitly set to 0 (using Cache-Control: max-age=0), which makes the cached response stale immediately.

Cached responses that are still fresh can be reused on subsequent requests without soliciting the server. This eliminates the following steps:

Stale responses can also be reused, but the cache has to revalidate them first by sending a conditional request to the server. The server will either reply with a new response or with an empty response and a 304 Not-Modified status code, instructing the cache to reuse its stored response.

When making a conditional request, we still have to:

Cache with revalidation

Cache revalidation: In this example, the user requests a page and receives a 50KB response containing version 1 of the page, which stays fresh in the cache for 10 minutes. The user requests the page a second time after 5 minutes, and since the response stored in the cache is still fresh, the cache sends it to the user. After another 5 minutes, the user requests the page again, but the cached version is now stale. Therefore, the cache sends a conditional request (If-None-Match: "version 1") to the server to verify that version 1 of the page is still the currently published version. The server sends back a 304 Not Modified header with no body. Seeing this, the cache marks the response it already has as fresh again and uses it to respond to the user.

After that, the site editor publishes version 2 of the page. The user requests the page again. They get a version 1 response from the cache instead of version 2 because the cached data is still considered fresh. Finally, the user requests the page which is now stale in the cache. The cache sends another conditional request to the server, the server responds with a new 50KB response containing version 2 of the page, and the cache stores this new version (replacing the old one) and responds to the user with it.

Revalidating stale data in the background

Stale-while-revalidate (also referred to as SWR) is another cache control option that the server can provide alongside max-age. It defines a period during which the cache can respond with stale data while revalidating the content in the background using a conditional request, and therefore hiding the delays associated with the revalidation of cached data.

Stale While Revalidate caching mechanism

Stale-while-revalidate: In this example, the user requests a page, and the server responds with the currently published version (version 1), which is then stored in the cache. The server indicates that the response can be considered fresh for 10 minutes, and once it becomes stale, it can still be reused and revalidated in the background for an additional 5 minutes. Later, when the site editor publishes a new version, the user requests the same page again. At this point, the cached version has been stale for less than 5 minutes, so the cache immediately responds to the user with version 1. Meanwhile, it sends a conditional request to the server and subsequently retrieves version 2 of the page.

Later, when the user requests the page again, the cache responds with version 2, which is still fresh, making the user perceive the new response as loading instantly. The stale-while-revalidate mechanism has effectively concealed the latency involved in retrieving version 2 of the page from the server.

Fifteen minutes later, the user requests the page again. This time, the cached version is stale, and the stale-while-revalidate duration has also expired. As a result, the cache cannot respond to the user until it revalidates the cached version with the server.

Client-requested cache revalidation

Clients can ask intermediary caches to disregard cached responses and to reach to the origin server:

Client-requested cache revalidation

Client-requested cache revalidation: In this example, the user visits a page containing an image. Both the page and the image are received from a shared cache and are stored in the user's browser local cache.

Later when the user navigates back to the page, it is served directly from the browser's cache.

The user then presses Ctrl+Shift+R to trigger a forced reload. Although the page is still fresh in the local cache, the browser ignores its stored version and sends a request with Cache-Control: no-cache header to the server. Seeing this header, the shared cache ignores its stored version and forwards the request to the server which generates the page and sends the response. Once the browser receives the page, it requests the image referenced by the page, always using Cache-Control: no-cache header.

Cache busting

Cache busting is a caching strategy commonly used to address the dilemma of having to choose between short and long cache freshness durations (max-age):

Cache busting works as follows:

Cache busting

Cache busting: In this example, the client requests a page and receives a response that expires in 10 minutes. The page requests a script file /navbar.js?version=1 and receives a response that can be reused for a whole year.

The site editor publishes a new version of the page. The client requests the same page again, but since its cached version has expired, it reaches out to the server and gets the new page. The page requests the same script file again, which is delivered from the cache, as it remains fresh for a year.

The site editor publishes a newer version of the page. The client requests the page again and reaches out to the server to retrieve it. The page requests a new script file this time (/navbar.js?version=2), which is downloaded from the server and cached to be reused for up to one year.

Cache busting, as explained so far, can be further optimized:

Caching the static portions of webpages

Web pages often contain both static elements, which are the same for all users, and dynamic elements that vary based on individual user sessions. For example, on a product page of an e-commerce site, the static elements would be the product details, while the dynamic elements would consist of the contents of the user’s shopping basket, along with action forms that are secured by session-specific CSRF tokens.

We want to leverage caching for the static elements while still being able to provide personalized dynamic content. One approach to achieve this is to include only the static elements in the main HTML document, which can be cached on the client side and in intermediary caches. And to retrieve the dynamic parts of the page, additional requests are made, introducing some added loading latency.

For examples of how to do this, check out:

Fetching dynamic page parts with a separate request

Fetching dynamic page parts with a separate request: In this example, the client requests a page and receives a response from a shared cache. The page includes a script that fetches the dynamic parts of the page using a second request. This second request reaches the server, which responds with a non-publicly-cacheable response.

Later on, the client requests the page again. This time, the page is loaded directly from the client's cache, and a request is sent to retrieve the dynamic parts from the server.

Service Workers

Since 2018, all major browsers support the Service Worker and Cache Storage APIs. The former API allows websites to register a JavaScript worker in the user’s browser. This worker acts like a proxy server intercepting and responding to client network requests. As for the Cache Storage API, it allows web applications to programmatically manage a cache. Together, those APIs make it possible to write offline web applications and to implement caching rules that go beyond what is possible in standard HTTP.

Caching in interactive WebApps

In interactive AJAX-heavy web applications, it belongs to the client-side JavaScript code to decide when to load and reload data from the server. For example, it can automatically reload data after a certain time interval or after the user performs an action that writes to the database in the server.

This data loading and reloading by client-side code can be seen as a form of cache management, where the data loaded on the client is a cached version of server’s data. Many developers in the JavaScript community reach for the TanStack Query library, which provides APIs for cache management, as well as DevTools and integrations with various web frameworks.

Caching compiled code

Browsers do not only cache server responses; they can also cache compiled JavaScript code to optimize the startup time of frequently used web applications. For more information on this, check out the following articles by the Chromium and Firefox teams.


2.3. Compression

HTTP responses compression

Compressing HTTP responses can substantially reduce network usage. This feature is implemented by practically all web browsers, as well as most popular web server stacks and web hosting services, making its activation a matter of proper server configuration.

HTTP response compression works through content negotiation between clients and servers:

Cache busting

HTTP response compression: In this example, Client 1, which supports gzip compression, requests a web page. The server sends a gzip-compressed response, which the shared cache saves for later reuse. Client 2, which also supports gzip compression, requests the same page and receives a response directly from the cache.

Client 3, which supports Brotli compression (br), requests the same page. This time, the shared cache cannot reuse the gzip-compressed version, so it forwards the request to the server. The server sends a new response compressed with Brotli, which the shared cache saves for later reuse without overwriting the gzip response.

When Clients 2 and 3 request the page again, the cache is able to provide a response to both of them without having to reach out to the server.

Note that not all resources need to use HTTP compression: Some file formats, such as images and video files, are already compressed. Therefore, recompressing them again would waste CPU cycles and can increase their sizes (although slightly).

HTTP headers compression

In addition to response body compression, the HTTP protocol introduced header compression via the HPACK format in HTTP/2 and later via the QPACK format in HTTP/3. Header compression is implemented by browsers and the HTTP stack of web servers, requiring no effort from web developers. That said, knowing that it exists and how it works can inform some optimization decisions.

Header compression uses:

Request and response header fields can be encoded either literally or, when possible, using indices that reference entries in the static or dynamic dictionaries.

Thanks to dynamic dictionaries, repeatedly sent headers such as cookies can be sent only once during the lifetime of an HTTP connection, reducing network usage compared to HTTP/1.1, where they need to be sent with every request. This makes it possible to forgo the practice of hosting static content in cookie-less domains to avoid the cost of cookie retransmission.

HPack HTTP header compression

Header compression using HPACK: In this example, the client requests two resources (the index page "/" and "/favicon.ico") from the server using the large cookie header each time.

In the first request (to "/"), the client encodes the cookie value literally in HPACK format tagging it with the HPACK store command so that both it and the server store the cookie value in the HTTP session's dynamic table. The HPACK store command adds an overhead of 3 bytes here to encode the length of the cookie string.

In the second request (to /favicon.ico), the client encodes the cookie header value as a reference to the entry in the dynamic HPACK table. Instead of sending the whole cookie string, only one single byte is send to the server. Notice also that in this second request, the string /favicon.ico was also saved in the dynamic HPACK table, so that later requests to the same resource can replace it with one byte only. Since the string /favicon.ico is short, the HPACK store command adds 2 bytes of overhead.


2.4. Content Delivery Networks

It takes more time for data packets to travel between the client and the server the longer the distance is between the two. The time data takes to arrive from point A to B is called network latency. Technological advancement can reduce latency only to a certain point due to physical limits such as the speed of light: It takes a beam of light approximately 130ms to travel around the circumference of the Earth.

Content Delivery Networks (CDNs) aim to address the issue of latency caused by the physical distance between servers and clients. A CDN is a group of geographically distributed proxy servers (called Point of Presence or PoP) that sit between the servers (called origin servers) and the clients. CDNs cache origin server responses when possible and deliver them to clients from a nearby PoP.

Typical CDN features include taking care of TLS termination, HTTP caching, and compression, in addition to other features that vary from provider to another.

Without CDN illustration
With no CDN: In this example, all requests from users around the world are handled by the origin server, resulting in high latency for users who are far away.
With CDN illustration
With a CDN: In this example, most requests are handled by PoP servers located close to the users, resulting in low latency for all users. However, a small proportion of requests can only be processed by the origin server, resulting in high latency for distant users.

2.5. Bundling resources

Some network latency is introduced each time an HTML document or one of its resources loads other resources. This latency and network overhead can be avoided by embedding resources inline instead of referencing them by URL. Depending on the context, this is referred to as inlining, embedding, concatenation, or bundling. For example:

Without resource bundling
Without resource bundling: In this example, the client requests a page. Receiving the HTML file, it discovers that it needs to load icons 1, 2, and 3, as well as a script file. The client fetches these resources via additional HTTP requests. Once the script is loaded, the client finds that the script depends on another JavaScript module, necessitating yet another request to the server to load this module. The page finishes loading once all six resources are fully loaded.
With resource bundling
With resource bundling: In this example, the client requests a page. It discovers when it receives the HTML file that it needs to load the icons-sprite.svg file, which contains icons 1, 2, and 3, as well as a script file that includes both the page's script and its dependencies. The page finishes loading once the HTML, the sprite, and the script are fully loaded.

As stated previously, the main benefit of bundling is reducing network overhead. In addition to that:

However, these gains come at a cost:

Because of the performance drawbacks of bundling, some optimization tools implement the opposite optimization: outlining, i.e., extracting inlined elements into their own files.

Cache without inlining
Caching without inlining: In this example, both non-cacheable dynamic pages, page1.html and page2.html, use the same CSS file, which is properly cached and reused.
Cache with inlining
Caching with inlining: In this example, non-cacheable dynamic pages, page1.html and page2.html, each inline the same CSS file into their content, requiring clients to download the styles again with each page visit.

Bundling in the HTTP/2+ era

With the advent of HTTP/2, many web articles declared the death of resource bundling, because:

Smashing Magazine’s article series on HTTP/3 explains in details why bundling is still relevant in HTTP/2 and HTTP/3. Some of the reasons are that:


2.6. Reducing content size

By reducing the size of webpages’ sub-resources, we reduce network bandwidth usage and the amount of data that the client has to process.

Image optimization

Images make up around 50% of the bandwidth of the average website , making them a good candidate for optimization. Image file sizes can be reduced by:

The HTML <img>, <picture> and <video> elements allow webpages to provide multiple sources for the same multimedia item and let the browser pick the version of the appropriate format and size. This allows websites to provide alternative versions of the same image: Different resolutions for different screen sizes, and images in both modern, well-optimized file formats for new browsers and in older formats for legacy browsers.

As setting up a system that provides multiple sources for each image can be quite complex, many web frameworks and hosting services include image optimization tools to automate this task.

Subsetting web font files

Web pages can use text fonts installed on users’ systems, and can also load custom Web font files. These files can be large, as they may include the glyphs of a very large set of Unicode characters to support many languages. To avoid loading glyphs that are never used on the web page, font files can be split into separate files that define each the glyphs of a subset of Unicode characters (for example, only Latin characters or only Arabic characters). This process is called subsetting.

When a web font is defined by multiple subset files, the browser ensures it downloads only the files containing glyphs that actually appear on the page. Check out the MDN articles on unicode-range and the section on loading only the glyphs you need.


2.7. Reducing client-side code size

2.7.1. Using optimizing bundlers

Developers do not typically ship their source code unchanged to clients. Instead, they use bundling tools or frameworks that include such tools to transform the website’s source code and its dependencies (such as libraries and assets) into bundle files that are ultimately served to the clients.

As we discussed in Bundling resources, bundling reduces network overhead. Additionally, bundling tools implement features like Minification and Tree-Shaking, which help reduce code size.

Minification

Text files such as HTML, CSS, JavaScript and SVG files contain elements that are useful for developers but not for end users: whitespace formatting, comments and intuitive variable names. Minification is the process of removing those elements.

Here is an example of source code:

// This is a comment
function add(first, second) {
  return first + second;
}

// This is another comment
function multiply(first, second) {
  return first * second;
}

And here is the same source code after minification:

function add(n,t){return n+t}
function multiply(n,t){return n*t}
Tree-shaking

Bundlers can also reduce code size with Tree-Shaking; by removing any code that static analysis shows to be unreachable from the bundle’s entry points. Outside web circles, this concept is more generally known as Dead-code Elimination.

Here is an example of source code:

// utils.js
function add(first, second) {
  return first + second;
}
function multiply(first, second) {
  return first * second;
}

// main.js (entry point)
import { add } from "./utils.js";
console.log(add(0.1, 0.2));

And here is the same source code after tree-shaking and concatenation: The multiply function is removed because it’s not used in main.js.

function add(first, second) {
  return first + second;
}
console.log(add(0.1, 0.2));

2.7.2. Using small libraries and small third-party scripts

When choosing libraries and third-party services, code size should be one of the selection criteria.

As a general rule, we should avoid using kitchen sink libraries (i.e., libraries that integrate all sorts of features) and instead choose libraries that meet the specific needs of our applications. The appropriate libraries will differ when rendering, for instance, read-only tables versus editable ones. In the former case, a small and simple library may suffice, whereas in the latter case, a larger and more feature-rich library may be necessary.

Many tools can be used to determine the size of JavaScript libraries:

In addition to library size, the ability to tree-shake should also be considered when choosing libraries.

For example, check out the You-Dont-Need-Momentjs page comparing Moment.js, a utility library for handling date objects, with alternative libraries that are smaller and in the case of date-fns also tree-shakable.

It is important to note that tree-shaking has its limits as libraries typically contain a set of core modules that cannot be eliminated. For example, even though the MUI components library supports tree-shaking, using a single component from the library also loads the library’s core modules, which include a style engine and various other utilities. Therefore, instead of reaching for MUI to use just one of its components, it is better to look for a specialized library.

Comparing library sizes

Large vs lightweight libraries: This figure compares the gzipped client side bundle size of two versions of the same applications. In both versions, application code weighs 50KB.

The first app uses very popular but quite large libraries: React v19.0 (53.7KB including React-DOM), Next.js App router v15.3 (~46.5KB), MUI Date picker v8.5 and its dependencies (133.7KB) and Recaptcha (~225KB). In total the app weighs 508.9KB.

The second app uses SolidJS v1.9 (7.5KB), SolidRouter v0.15.3 (7.9KB), @corvu/calendar v0.1.2 (4.4KB) and ALTCHA v2.0 (23.9KB). In total the app weighs 93.7KB (18.4% the size of the large libraries version)

2.7.3. Keeping code in the server

Sometimes, when some portions of code are very large, it makes sense to keep them in the server and to execute them on client requests to avoid burdening the client by downloading and executing them.

Before adopting such a strategy, some concerns have to be considered:

Running code on the client
Running a lot of code on the client: In this example, a large script a-lot-of-code.js is requested, downloaded and loaded by the client. When the user triggers events needing this script, the client executes the script locally without having to involve to the server.
Running code on the server
Keeping large code on the server: In this example, the script a-lot-of-code.js is kept in the server and never transferred to the client. Each time the user triggers events needing this script, the client sends a request to the server which runs a-lot-of-code.js and sends results. Each time, the script's inputs and output are serialized and sent through the network.

Offloading code to the server can negatively impact developer experience (DX), as it can require creating API routes, modifying client code to call them, and managing serialization of inputs and outputs. However, this is not always an issue:

Some frameworks tackle this DX issue by making it simpler to send requests to the server:

Server-side and client-side rendering

In this section, also generally in web frameworks circles, the word rendering refers to the transformation of data from a structured format (like JSON) into the HTML that is displayed to the user. This is not to be confused by the rendering to the screen which is performed by browser engines.

Rendering can be done on the client, on the server, or on both. Where it is done has an effect on the amount of data that is sent to the client. Let’s look first at client-side and server-side rendering in purest form:

Hybrid approaches to rendering are possible too:

An application can render some parts of the page on the server and other parts on the client. For example, by rendering non-interactive page sections on the server and interactive ones on the client. This approach, which is quite easy to implement, can lead to a better code quality compared to doing pure SSR with imperative JavaScript for interactivity, as both SSR and CSR code can be written in a declarative style. For those familiar with the Astro framework, you can achieve this using the client:only directive.

Many modern JavaScript frameworks implement a more elaborate hybrid approach to rendering mixing SSR and CSR: The page is rendered a first time on the server. Then it is made interactive on the client by a process called hydration which attaches event handlers to the server-rendered HTML. The page is, or sections of it are, rerendered again on the client when the user interacts with it. Such frameworks have some nice properties for users and developers:

As such frameworks do both SSR and CSR, using them comes at a cost:

Pure SSR

Pure SSR: In this example, the server, gets page data, renders the page and responds to the client with rendered HTML. Upon receiving the response, the client renders the page to the user. Later when the client receives the page's client-side code, it loads it, making the page interactive.

Pure CSR

Pure CSR: In this example, the server responds with an empty page. The client downloads the page's code, loads it, figures which data it needs to get from the backend, requests it, and only upon receiving it can it render the page to the user. At that point the page is immediately interactive.

SSR with hydration

SSR with hydration: In this example, the server, gets page data, renders the page and responds to the client with rendered HTML and page data. Upon receiving the response, the client renders the page to the user. Later when the client receives the page's client-side code, it loads it and applies hydration code to make the page interactive.

Pure SSR vs Hydration vs Pure CSR

Pure SSR vs Hydration vs Pure CSR: This figure compares 3 approaches for implementing the same page. The page renders some data using 2 HTML templates, both of which are interactive requiring some client-side JavaScript code.

The first approach is pure server side rendering. The client downloads the rendered HTML and the code that makes the page interactive. Notice how template 2 is repeated 3 times in the downloaded HTML.

The second approach is to do server-side rendering, hydration and client-side rendering. The client downloads the rendered HTML, the code to render both templates 1 and 2, and the code that makes the page interactive. Like with the first framework, template 2 is repeated 3 times in the downloaded HTML. Notice also that the page loads more code to support hydration and client-side routing (We will talk about the routing part in the section Client side navigation). The page also loads the non-rendered data in order to support hydration and rerendering on the client when necessary. The page's data is downloaded twice: both in the rendered HTML and in raw format. This is sometimes called the double data problem.

The third approach is pure client side rendering. The client downloads no rendered HTML. It downloads the code to render both templates 1 and 2, and the code that makes the page interactive. Unlike with the previous frameworks, there is no repetition of template 2 because no HTML is downloaded. And like the second framework, the page loads the code for client-side routing the non-rendered data which is rendered to HTML on the client.

Partial hydration

To help reduce code size, some frameworks allow developers to control which parts of the application are rendered exclusively on the server (server-only components) and which parts can be rendered on both the server and the client. Server-only components code does not need to be sent to the client, reducing client-side code size. Additionally, server-only components do not need to be hydrated on page load, reducing initialization work. This feature is commonly referred to as partial or selective hydration.

Right now, the most popular frameworks supporting partial hydration are Astro via Islands, and Next.js via Server Components.

Full Hydration vs Partial Hydration

Full Hydration vs Partial Hydration: This figure compares 2 approaches for implementing the same page. The page renders some data using 2 HTML templates. The first template is interactive requiring some client-side JavaScript code while the second template is not interactive.

The first approach is full hydration: Both page's templates are rendered on the server and hydrated on the the client.

The second approach is partial hydration: Template 2 is only rendered on the server. Notice that the client doesn't need to download template 2's code or data.

Note that partial hydration gains can be nullified in small applications if the framework is too large. For instance, in the demo Movies App, the fully hydrated SolidStart version is smaller than the partially hydrated Next.js version. This is due to SolidStart being more lightweight compared to Next.js, as well as the fact that the app itself isn’t large enough for the code size savings from partial hydration to outweigh the overhead of the larger framework.

Comparison of the client-side code size of two implementations of the Movies App demo
Comparison of the client-side code size of two implementations of the Movies App demo: This figure shows the client-side code size of 2 implementations of the Movies App demo. While the Next.js (94KB) version partially hydrates the page, the SolidJS version fully hydrates it but still manages to load less JavaScript code (15.5KB).

2.8. Reducing CPU work in the client

In order to display the page to the user, the browser has to construct the DOM (Document Object Model) and the CSSOM (CSS Object Model). It also has to calculate the positions and sizes of the elements on the page - a process commonly referred to as layout - and finally paint the result on the screen. The browser has to repeat some of this work each time the page is manipulated by the user or by JavaScript code. Recalculating page layout is known as reflow.

To avoid overwhelming users’ devices with too much CPU work, the DOM should be kept small, CSS rules should be simple, and JavaScript code should avoid running for too long or inducing unnecessary layouts and paints.

2.8.1. Optimizing layout and reflow

If the browser DevTools show that the app is spending too much time doing layout, that may be caused by one of the following problems:

Layout thrashing

When JavaScript code manipulates the DOM incorrectly, it can triggers unnecessary layout recalculations — a problem known as layout thrashing. This is showcased in the figures Layout thrashing and No layout thrashing). When JavaScript code writes to the DOM, the browser has to recalculate the layout to update the UI for the user, but it doesn’t do that immediately. It waits to catch multiple DOM updates and then recalculates the layout once, repainting the page in its new state to the user. However, when JavaScript code reads certain properties from the DOM, it forces the browser to calculate the layout immediately. Now, when JavaScript code reads and writes repeatedly to the DOM in a loop, the browser is forced to recalculate layout again and again, and that is what we call layout thrashing.

Layout thrashing
Layout thrashing: In this example, the click event handler writes to the DOM and then reads the state of the DOM repeatedly in a loop. Each DOM-write invalidates the current layout calculations, so each subsequent DOM-read requires the browser to recalculate layout to read the correct current state of the DOM. The result is that it takes the browser 1.2 seconds to process the click event handler, leaving it unresponsive to user events during this time.
No layout thrashing
No Layout thrashing: In this example, the click event handler performs all the necessary DOM reads first. Then, it does all the DOM writes, invalidating the current layout calculations. Once the event handler has finished executing, the browser recalculates the layout once to render the final state to the user. The entire process takes 200 milliseconds to complete. Contrast it with the 1.2 seconds from the previous example.
Overreacting to user inputs

Certain UI widgets can trigger work while the user is interacting with them. A searchbox, for instance, may trigger search while the user is typing. With such widgets, it is important to ensure that we do not overload the client’s CPU, the network and the server with too much work which is likely to be discarded as the user continues editing. This can be achieved using techniques like debouncing (Waiting for user edits to stop for a specific time before triggering work) and throttling (Limiting the rate at which user edits trigger work).

Animating the wrong kind of CSS properties

Excessive layout recalculation can be triggered by using CSS animations or transitions on properties that cause reflow. For more information on this, check out Choosing properties to animate on MDN.

Complex CSS and big DOM

Finally, layout and reflow take longer as CSS rules become more complex and as the DOM grows in size. To address CSS complexity, I refer you to the MDN section on CSS performance. Regarding the size of the DOM, it can be reduced by employing techniques such as pagination or virtualization (also known as windowing). A newer solution to the problem of large DOM is CSS containment which is widely available since september 2024. It allows developers to mark DOM sections that can be rendered independently from each other. This enables the browser to skip painting and calculating layout for offscreen sub-trees of the DOM.

2.8.2. Client-side navigation

In web applications that load a lot of JavaScript code, it is inefficient to re-execute the app’s code on every page navigation. Client-side navigation can help address this issue.

Let’s first examine how browsers handle navigation by default. When a user clicks on a link:

An alternative way to handle navigation is through client-side navigation (also referred to as soft navigation or client-side routing):

Client-side navigation requires the application to load additional code to simulate the default browser behavior and to manage routing on the client, rather than relying on the server for that like in traditional navigation. This increases the client-side code size and adds complexity, which can be excessive for many websites and applications.

However, client-side navigation is often a performance requirement in JavaScript-heavy single-page applications (SPAs). It is also essential for applications that want to support URL-based navigation while maintaining the state of the page. For instance, in an audio streaming website, users expect the audio to continue playing seamlessly as they navigate between different pages.

Classic navigation
Default browser navigation: In this example, the browser creates a new JavaScript context when navigating between Page A and Page B. Page A's context is suspended and stored in the bfcache. When the user clicks the back button, this context is resumed, and Page B's context is suspended and cached. Note that the shared resources of Page A and Page B are loaded twice.
Soft navigation
Client-side navigation: In this example, a SPA handles navigation. To do so, it loads a client-side router script. Notice that the shared resources of Page A and Page B are only loaded once.

2.8.3. Using WebAssembly

CPU load can also be reduced by using a faster programming language. On the web, only two client-side programming languages are supported: JavaScript and WebAssembly (or Wasm for short): JavaScript code can load and instantiate Wasm modules, providing them with the functions they require as dependencies, and it can subsequently call the functions they export. While JavaScript can access web APIs directly, Wasm code can only access them by calling back to JavaScript. For more information about how all of this work, I highly recommend Lin Clark’s article series: A cartoon intro to WebAssembly Articles.

There are limits to how much performance can be derived from Wasm. Even if webApps use Wasm as much as possible, many (if not most) would not realize any significant performance gains because they are not usually CPU-bound.

The remaining webApps that are CPU-bound can achieve good performance using either Wasm or JavaScript code. It is surprising to what point JavaScript code can be optimized for performance, but for certain tasks, the optimal JavaScript code can be hard to read and to maintain. In those situations, it makes sense to write the code in a more adapted language and to compile it to Wasm.

Now, let’s look at the features of Wasm that gives it an edge over JavaScript from a performance point of view:

Feel free to skip to the end of the chapter, given that these subjects are quite technical and that a limited number of apps can derive performance gains from Wasm.

Control over Memory layout

The term memory layout is often used to describe the way objects are represented in memory. That is, each object’s size, alignment, and the relative offsets of its fields.

Low level programming languages give programmers control over objects memory layout. They also give them the ability to work with objects as values and as references (I.e. pointers, storing the addresses of objects in memory). With this level of control, programmers can write code that maximizes the use of the CPU cache and of CPU data prefetching for best performance. For more on this, check out Data-oriented design.

High level languages on the other hand try to be easy to use. To achieve this:

JavaScript is one such a high level language, imposing both of these limitations on programmers. In spite of that, researchers came up with asm.js, a subset of JavaScript that controls memory layout by never using JS objects and by instead only reading and writing numbers from and into a typed array. C++ programs can be compiled to asm.js code and can run in the browser. The Unreal 3D game engine, for instance was compiled back in 2013 to JavaScript and run at near native speed. Techniques from asm.js are still used today by polywasm to run Wasm modules in browsers that don’t support it or when it is disabled.

Wasm on the other hand is a low level language. Here is an overview of some of its memory related features:

The following 3 figures Memory layout in JavaScript, Memory layout in Wasm GC MVP and Memory layout in a low level language show the memory layout of the objects needed to represent essentially the same object in JavaScript, in Java compiled to GC-using-Wasm, and in Rust compiled to Wasm.

The TypeScript code for constructing our object of interest is:

type Key = [number, number];
type Value = { id: number; name: string; tags: strings[] };

// Since JavaScript Maps hash object references instead of their values,
// we need a mechanism to get a stable reference for Key objects
function getStableReference(key: Key): Key {
  // Get a stable reference
  // (via interning for example, which may require another map object)
}

// Our map object of interest
const map = new Map<StableKey, Value>();
map.set(getStableKey([1, 2]), {
  id: 1,
  name: "name 1",
  tags: ["tag 1.1", "tag 1.2"],
});
map.set(getStableKey([3, 4]), {
  id: 2,
  name: "name 2",
  tags: ["tag 2.1", "tag 2.2"],
});
// ...

The equivalent Java code is:

record Key (int x, int y) {}
class Value {
  Value(int id, String name, String[] tags) { /* ... */ }
  int id;
  String name;
  String[] tags;
}

// Our map object of interest
HashMap<Key, Value> map = new HashMap<>();
map.put(new Key(1, 2), new Value(
  1,
  "name1",
  new String[] {"tag 1.1", "tag 1.2"}
));
map.put(new Key(3, 4), new Value(
  2,
  "name2",
  new String[] {"tag 2.1", "tag 2.2"}
));

The Rust code for creating the equivalent object is :

##[derive(Hash, Eq)]
struct Key {
  x: i32,
  y: i32,
}
struct Value {
    id: u32,
    name: String,
    tags: Vec<String>,
}
type Map = HashMap<Key, Box<Value>>;

// Our map object of interest
let mut map = Map::new();
map.insert(Key {x: 1, y: 2}, Box::new(Value {
  id: 1,
  name: String::from("name 1"),
  tags: vec![String::from("tag 1.1"), String::from("tag 1.2")],
}));
map.insert(Key {x: 3, y: 4}, Box::new(Value {
  id: 2,
  name: String::from("name 2"),
  tags: vec![String::from("tag 2.1"), String::from("tag 2.2")],
}));
Memory layout in JavaScript

Memory layout in JavaScript: This figure shows the memory layout of the map object from the previous TypeScript snippet.

Note all the indirections needed at each level to represent this data structure. This problem is common in high level languages.

Note also the shape objects, without which the JavaScript VM wouldn't know how to read the properties of the dynamically typed objects.

Finally, note the extra props (properties) pointers added in each object to support any code that would extend our objects with more properties. We will get back to this in the section Dynamic vs static typing.

Memory layout in Wasm GC MVP

Memory layout in Wasm GC MVP: This figure shows the memory layout of the objects constructed by the Java example that is compiled to use Wasm GC.

Like in the JavaScript version, there is a lot of pointer indirection. In addition, this version uses a Java compatible hash table that is compiled to Wasm, which gives it two penalties: 1. It has to load the hash table code. And 2. this hash table is most likely less compact in memory compared to the native JavaScript Map.

On the positive side and thanks to static typing, objects allocated by Wasm code take less space than JavaScript objects. There is no space wasted to account for dynamically added properties. And shape objects (called Runtime Types or RTTs in Wasm) are only needed to validate subtype-casting, meaning that they have to contain less data compared to the JS equivalent which we'll explore in Dynamic vs static typing.

Memory layout in a low level language

Memory layout in a low level language: This figure shows the memory layout of the objects constructed by the Rust example.

There is little indirection compared to the previous versions. Many child objects are stored inline inside their parents: The keys are directly stored inside the hash table, and the tags tables metadata are stored inline inside the Value object. There is also no runtime information to store about the shapes of objects.

Unlike the JavaScript version which uses a native hash table implementation provided by the browser, and just like the Java version, the Rust hash table code must be bundled with the application code.

Dynamic vs Static Typing

JavaScript is a dynamically typed programming language were object shapes are determined at runtime:

To learn more about how JavaScript engines represent objects shapes information, I refer you to the article JavaScript engine fundamentals: Shapes and Inline Caches presented by Mathias Bynens and Benedikt Meurer at JSConf EU 2018.

Figure Accessing JS objects’ properties shows the representation of the object myObject from the following code example, and lists the steps required to execute the function callMethod0.

class ParentClass {
  property0 = 123;
  method0() {
    return this.property0;
  }
}

class ChildClass extends ParentClass {
  property1 = 456;
  method1() {
    return this.property0 + 1;
  }
}

const myObject = new ChildClass();

function callMethod0(object) {
  object.method0();
}

/* This function can only work thanks to dynamic typing */
function addNewProperty(object) {
  object.newProperty = 2;
}
Accessing JS objects' properties

Accessing JS objects' properties: This figure shows the JS objects and the Shape objects needed to represent myObject from the previous code example.

Steps 1 to 11 represent the memory accesses needed for the function callMethod0 to get the address of the code implementing method0. Steps A to D represent the memory accesses needed for method0 to read property0 from myObject.

All objects in this example have their extra properties fields set to null. This field is needed in case a function like addNewProperty is called with one of our objects and if this object has no space in it to store the added property.

If JavaScript engines went through all the steps explained above on every object property access, they would be very slow. Thankfully, they are heavily optimized: They analyze hot code paths (that is, frequently-run code sections) at runtime to detect which shapes of objects they operate on, and they generate optimized code for those shapes. For more on this, check out inline caching and Just-in-time (JIT) compilation. Code specialized for an object shape doesn’t have to lookup object property names in the shape dictionary on each access. Instead, it directly accesses the object property by its offset inside the object.

Unlike JavaScript, Wasm is a statically typed language. Object properties can be accessed using a single memory read at a relative offset from the object’s address.

Accessing a statically typed objects' properties

Accessing a statically typed objects' properties: This figure shows the objects needed to represent myObject from the previous code example in a statically typed language such as Wasm.

Steps 1 to 3 represent the memory accesses needed for the function callMethod0 to get the address of method0. And steps A and B represent the memory accesses needed for method0 to read property0 from the object.

No extra properties fields need to be reserved inside our objects because functions like addNewProperty are simply invalid.

Human-readable vs Binary code

Before the browser gets to execute a JavaScript file, it has to download it fully and then to parse it (which is quite complex given the rich language syntax which is optimized for human readability).

Wasm’s binary format was designed as to address this issue. To optimize loading time, many criteria were taken into account:

Nowadays, browsers can even parallelize wasm code compilation.


In chapter 2, so far, we looked at optimization techniques that work by reducing the amount of work that has to be done, or in other words, techniques that minimize resources wasting. In the next chapter, we look at optimization techniques that work by minimizing time wasting.


3. Scheduling work to make users wait less

In this chapter, I present techniques that allow websites and applications to reduce the time users have to wait for apps responses, without reducing the the amount of work the sites/apps have to do, but by scheduling work smartly.

To visualize the effect of different scheduling strategies, I generated waterfall charts with fixed parameters such as file sizes, execution times and network bandwidth and latency. I set network parameters as to simulate regular 3G performance, and I divided the network bandwidth between simultaneously sent resources equally. Feel free to download the code and to generate charts with different parameters.

3.1. Do not block the UI thread

The browser runs JavaScript code in an event loop. When the user manipulates the page and when input/output operations progress, events are generated and JavaScript event handlers are run. The browser has to wait for any currently running JavaScript code to finish before it can respond to new events. So if JavaScript code runs for too long without yielding control to the event loop, the page becomes unresponsive to the user - a condition called jank.

Therefore, JavaScript code should execute in brief bursts to keep the UI responsive. Two strategies can be used to handle long JavaScript tasks:

Long task blocking the event loop
Long task blocking the event loop: In this example, the user clicks on the UI twice: Once at t=0 and a second time at t=500ms. The first click's event handler runs for 50ms and then calls a function Long task which runs for 800ms. When the second click event arrives, Long task is still running, so the browser waits for it to finish (waiting 350ms). At t=850ms the main thread is free again, the second click handler is run taking 50ms and the browser runs layout (taking 100ms) to show the updated UI to the user. The user sees the result of the second click at 1000ms (A latency of 500ms).
Long task split into short ones to not block the event loop
Long task split into short ones to not block the event loop: In this example, the user clicks on the UI twice: Once at t=0 and a second time at t=500ms. The first click's event handler runs for 50ms and then calls a function Long task. This function takes 800ms in total but it splits its work to chunks of 100ms yielding control to the main thread after each chunk. When the second click event arrives, chunk number 5 of Long task is running, so the browser waits for it to finish (waiting 70ms). At t=571ms the main thread is free again, the second click handler is run taking 50ms and the browser runs layout (taking 100ms) to show the updated UI to the user. The user sees the result of the second click at 721ms (A latency of 221ms).
Long task running in a Web Worker
Long task running in a Web Worker to not block the event loop: In this example, the user clicks on the UI twice: Once at t=0 and a second time at t=500ms. The first click's event handler runs for 50ms and trigger the execution of the function Long task in a Web Worker. When the second click event arrives, the function Long task is still running, but since it is running in a separate thread, the browser runs the second click's event handler immediately taking 50ms and then runs layout (taking 100ms) to show the updated UI to the user. The user sees the result of the second click at 650ms (A latency of 150ms).

The Partytown library uses web workers in an interesting way: It allows running third-party scripts (analytics scripts for example) outside the main thread using proxy objects to simulate the DOM inside a web worker and using synchronous XHR to hide the asynchronicity of the communication between the main thread and the worker. This can help reduce jank coming from third-party scripts.


3.2. Streaming

Gradual content delivery with streaming

Dynamically generated web pages are sometimes composed of parts that are fast to generate and some other parts that take longer to generate. It’s desirable to deliver the parts that are ready to the user while the slower parts are still in the making. This way:

It is possible to do exactly that thanks to the streaming capability of HTTP (since version 1.1 of the protocol) and thanks to the HTML format being streaming-friendly (Browsers can process and show HTML documents progressively as they are received).

Unlocking Parallelism with Streaming

When the server receives a request for a page, it can:

This way, the client can start loading the page’s sub-resources in parallel with the server generating and sending the rest of the page.

Not streaming HTML diagram
Not streaming HTML: In this example, the server waits for the whole page, head and body, to be generated (t=310ms) before it sends it to the client. The client receives the head of the page at t=362ms, at which point it starts loading the files style.css and script.js and continues loading the body of the page. When style.css is loaded (at t=473ms), the browser starts constructs the CSSOM. The browser also starts executing script.js onces it is loaded (t=672ms), waiting first for the CSSOM to be constructed (t=706ms). As a result, the page is rendered and interactive at t=1006ms - Simulation numbers.
Streaming HTML diagram
Streaming HTML: In this example, the server streams the page parts, the head and the body, as soon as they are ready. It starts by streaming the page head to the client allowing it to request style.css and script.js at t=112ms (instead of t=362ms from the previous non streaming example). As a result, the page is rendered and interactive at t=785ms (248ms earlier than the non streaming example) - Simulation numbers.

Out-Of-Order Streaming

Sometimes, webpages are composed of sections that can load concurrently and that may not finish loading in the right order to stream them directly to the client. For those situations, some frameworks implement a feature commonly called today out-of-order streaming: The framework loads page sections concurrently, streams them to the client as they become available, in whichever order that is, and ensure to render them in the correct position in the page.

Since MarkoJS pioneering out-of-order streaming in 2014, some more popular JavaScript frameworks rediscovered and implemented the technique. An interesting example is SolidStart which supports out-of-order streaming in both server-side rendering (SSR) and client-side rendering (CSR) modes (modes that you can switch between by changing a configuration flag). In SSR mode, the server streams both HTML and data to the client: the framework inserts the HTML into the page and passes the data to client-side code. In CSR mode, the server streams only the data the client where client-side code renders it into HTML.

No Out-Of-Order Streaming diagram
In-Order Streaming: In this example, the server streams the head element of the page and loads the 3 sections of the page in parallel before streaming them to the client. The second and third sections finish loading early but they are not streamed to the client until after the first section to be ready. At t=1036ms, the client receives section 1, rendering it at t=1136ms (before that, all the user sees is an empty shell). The page is completely rendered and interactive only at t=1236ms - Simulation numbers.
Out-Of-Order Streaming diagram
Out-Of-Order Streaming: In this example, the server streams the head element of the page, loads the 3 sections of the page in parallel and streams them to the client as soon as they are ready. The client receives section 2 at t=536ms and renders it at t=756ms (480ms earlier than with in-order streaming). It receives section 3 at t=936 and renders it at t=1036ms (200ms earlier than with in-order streaming). And finally, it receives section 1 at t=1036ms and renders it at t=1136ms (100ms earlier than with in-order streaming) - Simulation numbers.

Beyond HTTP response streaming

In addition to HTTP response streaming, the Web platform provides APIs to stream data between the client and the server:

More recently, newer APIs arrived like:


3.3. Preloading

In order to optimize the loading of webpages, browsers try to schedule the loading of resources assigning a sensible priority to each sub-resource. As the browser parses the page, it discovers sub-resources and loads them or schedule them to be loaded. Browsers try to discover and load high priority resources as soon as possible - even before the page parser gets to see them. To do so, they use a preload scanner process which runs concurrently with the main thread and which identifies and initiates the loading off sub-resources in the yet to be processed HTML.

There remains a limit though to what browsers can do automatically for us. After all, browsers cannot know if the page needs a sub-resource until they see it in the server response. For example: When loading a page which loads a CSS file /style.css which itself imports another CSS file /style-dependency.css. The browser may be able to discover the link to the first CSS file early by scanning the HTML document, but it has to fetch this file and to parse it before it discovers and fetches the second one.

<!-- page1.html -->
<!doctype html>
<head>
  <link rel="stylesheet" href="/style.css" />
</head>
/* /style.css */
@import "/style-dependency.css";

With preloading, web pages can declare the intent to use sub-resources without inserting them immediately into the page. This way the browser can start loading the preloaded sub-resources early, so that when they are ultimately needed, they load fast.

Preloading can be done using <link rel="preload"> tags, supported by all major browsers since January 2021. Using preload link tags in the previous example would look like the following:

<!-- page1.html -->
<!doctype html>
<head>
  <link rel="preload" as="style" href="/style-dependency.css" />
  <link rel="stylesheet" href="/style.css" />
</head>

Another tool for preloading is the Link HTTP response header, supported by all major browsers since late 2023. Link headers with rel="preconnect" or rel="preload" have the same semantic as the equivalent HTML <link> element, but they have the advantage that servers can send them before they start generating the page’s HTML, helping browsers discover page sub-resources earlier. If we use Link headers in the previous example page, we get a first HTTP response chunk containing the headers:

200 OK
Link: </style.css>; rel="preload"; as="style", </style-dependency.css>; rel="preload"; as="style"

followed by other response chunks with the rest of the page.

Since servers have to determine the status code of the page and sent it before they can send HTTP headers, there can be a delay before the Link HTTP headers are sent. That’s why a third tool for preloading was created: The HTTP 103 Early Hints informational response, available with preloading capability in all major browsers except Safari since late 2023. Servers can send a 103 Early Hint response including preload Link headers and then send the actual response (including the status code) when they are ready. This allows browsers to discover and start fetching page sub-resources even earlier.

Using early hints, the previous example would look like. the following:

103 Early Hint
Link: </style.css>; rel="preload"; as="style", </style-dependency.css>; rel="preload"; as="style"

200 OK
Content-Type: text/html

<!doctype html>
...

Note that you don’t have to choose between preload link tags and preload link headers (with or without Early Hints). You can use both at the same time to support older browsers and to preload more things if they are discovered late during page loading.

The following 4 diagrams show the simulation of the loading of 4 versions of the same page page.html which loads a style.css stylesheet which itself requires another style-dependency.css stylesheet. The simulation uses the same parameters for the time it takes the server to determine the status code (250ms), to generate the page’s head (100ms) and to generate the body (150ms). The 4 versions of the page differ in that:

No preloading diagram
No preloading: In this example, after the client requests the page, it receives the page's head element at t=452ms, at which point it requests the style.css. Upon receiving this file (t=657ms), the client requests style-dependency.css. Only after receiving the two style files and the page's body, the client renders the page finishing at t=1033ms.
Preloading with early hints diagram
Preloading with early hints: In this example, after the client requests the page, it receives a 103 Early Hints response containing headers for preloading at t=101ms, at which point it requests both style.css and style-dependency.css file. After receiving the two style files and the page's body, the client renders the page finishing at t=796ms.

Now, let’s explore now some use cases of preloading. The examples which will follow show loading speed improvements from preloading even though they only use the widely supported <link rel="preload"> tags.

Preloading web fonts

By default, browsers load web fonts only when needed. That is, the browser doesn’t load the web font files declared in the page’s CSS until an element in the page uses font-family and font-style requiring them. Because of this, the browser may render parts of the page a first time using a system font and then rerender them again when the web fonts are loaded, causing layout shifts which can disturb users.

Preloading web font files helps address this problem. Layout shifts can be avoided altogether if the web fonts are loaded early enough, or at least they may occur early during the loading of the page as not to disturb the user too much. It is also good practice to self host web fonts. Ie. to host them on one’s own server, rather than relying on a third-party server, to avoid the latency of establishing a secure connection to a different host.

No web fonts preloading diagram
No web fonts preloading: In this example, the client determines that it needs the two web font files only after it constructs the CSSOM (at t=388ms). It start fetching them at that point, renders the page a first time with a system font, and later rerenders the page a second time using the web fonts, finishing at t=665ms.
Web fonts preloading diagram

Web fonts preloading: In this example, two web fonts are preloaded in the page's head element. As soon as the client receives the head element (t=112ms), it starts fetching the style file and the preloaded web fonts. Once the page's body is loaded and once the CSSOM is created, the client renders the page, once only, using the already loaded web fonts, finishing at t=556ms (109ms earlier than the example without preloading).

Notice that style.css takes longer to load in this example than in the previous one. That is because the simulation is taking into account that the client is simultaneously downloading this file and the web font files.

Speeding up SPAs startup

In single-page applications that rely on client-side code for data retrieval, data fetching takes place only after the code is fully loaded and executed. To optimize this, we can use a preload tag to instruct the browser to start loading the page’s data as soon as it retrieves the head of the page. This allows the data fetching to occur concurrently with the loading of the client-side code.

Preloading page data can enable performance that closely approximates what can be achieved using the streaming solutions discussed in the previous section, in which the server can start fetching page data as soon as it receives the request for the page’s URL. Compared to that approach, preloading page data on the client adds latency because it has (1) to load the page’s head and (2) to make a second request, before the actual data fetching can start on the server. However, this latency is significantly reduced when the webapp is served via a CDN, because the page’s head is received very fast. Furthermore, when the page is loaded from the client cache, this latency disappears entirely, leading to no additional latency compared to the streaming solutions.

SPA without preloading diagram
SPA without preloading: In this example, the client downloads the single page app's code which starts fetching page data at t=556ms. The data is received at t=972 and finally rendered to the screen at t=1172ms.
SPA with preloading diagram
SPA with preloading: In this example, as soon as the client receives the page's head element (t=112ms) it starts preloading the page's data. The data is received at t=529ms and is ultimately rendered to the screen at t=756ms (416ms earlier than the example without preloading).

Speeding up client-side navigation

Preloading can also improve the speed of client-side navigation. When the user clicks a link, the client-side router had to load the next page’s JavaScript code and its data, ideally in parallel. A further optimization is to prefetch next page’s code and data when the user hovers over its link.

Client-side navigation without preloading diagram
Client-side navigation without preloading: In this example, when the user clicks the link at t=500ms (500ms after they hover over it), the client-side router downloads the code for the next page. Once the code is loaded (t=976ms), it fetches the data for that page. The data is received at t=1339ms and is rendered to the user at t=1539ms (1 second after the click).
Waterfall diagram
Client-side navigation with data preloading: In this example, when the user clicks the link at t=500ms (500ms after they hover over it), the client-side router downloads the code for the next page while simultaneously fetching its data. The page is ultimately rendered to the user t=1190ms (690ms after the click).
Waterfall diagram

Client-side navigation with code and data preloading: In this example, when the user hovers over the link, the client-side router preloads the next page's code and data. When the link is eventually clicked (t=500ms), both code and data are available. As a result, the page is rendered to the user at t=700ms (200ms after the click).

In fact, if we started rendering the preloaded next page offscreen before the click, it could have been shown to the user even faster.


3.4. Deferring non-critical resources

CSS and JavaScript resources are sometimes render-blocking, meaning that the browser has to wait for them to load before rendering the page to the user. This is necessary to prevent the Flash Of Unstyled Content (FOUC), where users briefly see unstyled elements before the styles are applied.

Any CSS and JavaScript resources that are not needed for the initial rendering of the page, should ideally be loaded asynchronously in order to unclutter the critical rendering path. This can be achieved using:

Using a single render-blocking script diagram
Using a single render-blocking script: In this example, the user requests an HTML page which loads render blocking CSS and JavaScript files. The browser processes these files constructing CSSOM and executing JavaScript code before rendering the page, finishing at t=785ms.
Asynchronously loading non-critical JavaScript diagram

Asynchronously loading non-critical JavaScript: In this example, the page's script is split into a render blocking one and a second asynchronously loaded script. As soon as the render blocking CSS and JavaScript resources are processed, the browser renders the page at t=572ms (213ms earlier than the previous example). As for the async script, the browser loads it with low priority, executes it and rerenders the page again, finishing completely at t=885ms (100ms later than the previous example).

Notice that style.css loads in this example faster than the same file in the previous example. That's because the simulation takes into account that once the smaller script file from this example is done loading, style.css continues loading using the whole network bandwidth.


3.5. Lazy loading

Loading content only when necessary, a technique known as lazy loading, can enhance performance by allowing high-priority resources to load without the interference of low-priority ones. This can be achieved in HTML by marking images and iframes with the attribute loading="lazy" to delay their loading until they need to be rendered on the screen.

Beyond what is natively possible on the web, web frameworks offer means to lazily load code and data:

Keep in mind that lazy loading can cause users to wait when the resources are finally needed. To mitigate this, applications should start loading resources as they become likely to be needed.


Conclusion

To summarize this article:

Due to the multiplicity of factors affecting performance, and due to the unique needs of websites and applications, there is not a one-size-fits-all solution or framework. In fact, websites and apps can perform well even when using subpar technologies, as long as the overall architecture fits their needs. That said, I would say that the best frameworks are those that enable developers to choose the high-performance options with minimum friction and while keeping the code clear and maintainable.