Core Web Vitals and How Best to Supplement Them

Measure what matters and keep your application reliable and interactive

Sep 22, 2023

In last week’s post, I talked mostly about measurement and how to get started measuring the performance of the website you’re working on. Some of the easiest metrics to come by are the Core Web Vitals (CWV) — they’re front and center in the Lighthouse report metrics, which are often used in online discourse to ~~show how fast (insert favorite web framework)’s empty starter project is~~ compare the relative performance of web applications.

Because of their ubiquity in performance discussions and the ease of which they can be measured, it’s important to take a look at what the Core Web Vitals are measuring, what the Google Chrome team calls “other” web vitals, and where you should supplement these vitals with what’s uniquely vital to your application.

The Core Web Vitals

Largest Contentful Paint (LCP)

Definition: The Largest Contentful Paint (LCP) metric reports the render time of the largest image or text block visible within the viewport, relative to when the page first started loading.

Given the definition above, you can consider LCP to be a proxy measurement for when the bulk of the page rendered for the user. One interesting thing is that LCP can change depending on when you stop the measurement — if you stop after just rendering a page title, the time that the title rendered would be the LCP value. However, if you measure for the entire duration of the page load (as is recommended) and a large image loads after the heading, that time would be the LCP value.

Cumulative Layout Shift (CLS)

Definition: CLS is a measure of the largest burst of layout shift scores for every unexpected layout shift that occurs during the entire lifespan of a page.

The calculation that goes into the CLS score is pretty involved (I’d recommend reading more in the article linked), but at a high level CLS measures the impact of elements rendered to the screen and then moving position during the page render. Note that for it to count as a shift, the content has to render and then move — initial rendering to the screen doesn’t count as a shift. If you render, say, the heading for an article and then an advertisement pops in and pushes that header down the page, that’s a layout shift.

This is an extremely annoying issue for users of the site, and the reason that the calculation for the metric is so involved is that it’s hard to measure quantitatively. Buttons shifting around, especially when replaced by an advertisement, is a frequent complaint about layout shifting when rendering a page. Newer features to HTML and CSS like aspect-ratio help define the content area for things that will load later but still reserve the space for them, preventing layout shifts.

First Input Delay (FID) (leaving core web vitals soon!)

Definition: FID measures the time from when a user first interacts with a page (that is, when they click a link, tap on a button, or use a custom, JavaScript-powered control) to the time when the browser is actually able to begin processing event handlers in response to that interaction.

Interaction to Next Paint (INP) (joining core web vitals soon!)

Definition: INP is a metric that assesses a page's overall responsiveness to user interactions by observing the latency of all click, tap, and keyboard interactions that occur throughout the lifespan of a user's visit to a page. The final INP value is the longest interaction observed, ignoring outliers.

I’ll talk about FID and INP together given that the latter is replacing the former as a core web vital in 2024. FID looks at the delay between the first interaction and the browser’s response to it, whereas INP looks at a similar metric but over the session duration and reports the worst value that it saw.

To keep your app interactive and responsive, you want to both limit the amount of JavaScript running on the main thread while keeping the things that are running short-lived so the browser can react to user interaction as fast as possible. These metrics will tell you how long users are waiting for their input to take effect, and higher values here can cause perceived lag or an uncanny valley effect where reactions to their input are delayed and you lose the sense of cause and effect that happens when you’re interacting with something on the page.

Other Web Vitals

While not “core” metrics (according to the Google team; they have a lot of data and I very much trust their opinions!), the following four metrics can tell an important story about how your website’s performing.

Time to First Byte (TTFB)

Definition: TTFB is a metric that measures the time between the request for a resource and when the first byte of a response begins to arrive.

In contrast to all of the core web vitals, TTFB is mainly concerned with how you’re getting your content to the browser rather than what you’re telling the browser to do. A high TTFB tells you that the browser was waiting on a server for longer than it should. This could be because the server’s doing too much work, you could leverage some caching to not have to generate HTML for each individual user, your server is geographically distant from the user requesting the data, or, if you hit the jackpot, all of these and more.

This is an important metric to measure because, ultimately, none of the other metrics can complete while the browser is waiting for the initial content. The browser is, for all intents and purposes, sitting completely idle and unable to make any progress on rendering content for the user, so bringing this down should have an impact on all other user-facing metrics.

First Contentful Paint (FCP)

Definition: The First Contentful Paint (FCP) metric measures the time from when the page starts loading to when any part of the page's content is rendered on the screen. For this metric, "content" refers to text, images (including background images), <svg> elements, or non-white <canvas> elements.

Similar to LCP, FCP measures the very first time something was painted to the screen, as opposed to when the largest element was painted to the screen. When combined with TTFB, this can tell you the time from when the browser was first able to start doing something with the content you sent it from your server to when the user was first able to see anything. This time can show issues that you might have in effectively starting your application, like too much JavaScript running or waiting on the network before rendering any content.

Time to Interactive (TTI)

Definition: The TTI metric measures the time from when the page starts loading to when its main sub-resources have loaded and it is capable of reliably responding to user input quickly.

To summarize how the above article explains how to calculate TTI, you basically take the FCP time, then look for the first five second window where there were no long tasks (tasks that took the browser over 50ms to complete) or more than two in-flight network requests. When you find that five second window, TTI is defined as the time from FCP to the start of that five second interactive window of time.

Looking at the definition and how the metric is calculated, you could squint and call this a reliability metric with respect to how reliable the application is to responding to user input. If you load and render the content pretty quick, but you’re constantly fetching more data or causing the browser to block user input by scheduling long tasks, the user can’t reliably use the website yet.

One interesting observation is that it can result in users not trusting your website on initial load. If they experience this constantly — the feeling that sometimes they can interact with the page quickly after initial load, but sometimes things hang up or are janky — they’ll start to load the page and completely stop trying to interact with it for longer than the page takes to actually become interactive reliably, effectively making perceived performance worse than the “real” performance that you measure.

Total Blocking Time (TBT)

Definition: The Total Blocking Time (TBT) metric measures the total amount of time between First Contentful Paint (FCP) and Time to Interactive (TTI) where the main thread was blocked for long enough to prevent input responsiveness.

Specifically, TBT is the sum of the excess time spent on long tasks during the window between FCP and TTI. For example, in this span, say you had long tasks that were 55, 105, 75, and 60 milliseconds in duration. You take those tasks, subtract 50 milliseconds from each, and the add them — in this case, TBT would be 85 milliseconds.

If TTI measures how long the website wasn’t reliably available, TBT measures just how unreliable it was during that time. If you have one long task, it’s probably not that bad, and unless the user got somewhat unlucky and tried to interact with the page exactly when this task was running, the application would’ve largely been responsive to the user. The higher your TBT value, the worse the user pain during the wait for TTI is.

Where are the gaps?

The metrics above track quite a bit of information: you get proxies for boot performances, interactive performance, and even a little server performance! If you’re tracking all of these metrics, you’re going to have pretty good insight into how your website’s performing for users. There’s two main concerns that I think you’ll still want to address: why is the site performing this way? And am I tracking what’s most important to my users?

To answer the why, you need to dig into the specifics of the application, something that the web vitals above weren’t designed for. How much JavaScript am I loading? Am I loading this at the right time, or am I loading all of it up front? Are those assets compressed properly? Have I optimized my images so that I’m serving the right qualities given the devices and sizes I’m trying to render them at? These are questions that the measurements above can help answer.

If you’re seeing quick TTFB times but long FCP times, it could be that you’re having the browser download, parse, and execute a ton of JavaScript before the browser can render any content to the user. If TTFB and FCP are low by LCP is high, you could be rendering content pretty quickly but serving a huge, pixel-dense image to a mobile device with a small screen and low resolution. The web vitals will tell you what is happening, but it’s up to you to look at these measurements and start to deduce why they’re happening.

In addition to that, you might not be covering core product scenarios with the web vitals, especially if they’re happening a decent amount of time past the initial boot and render phase. If a core scenario is the time it takes from clicking on a notification to rendering the deep-linked experience for that notification, INP and TBT might help inform you that slow things are happening on the website, but they won’t be able to narrow it down specifically to that scenario — you’ll have to complement the vitals with user timings from the Performance API (covered in last week’s issue).

Armed with the above vitals and performance measurements specific to the intricacies of your product, you and your users are going to be in great shape.

Next Week

Next week I’m going to talk about a case study where we intentionally chose for a task to take ~500ms instead of ~100ms at the 95th percentile (oh, the intrigue!). Thanks for reading and for everyone who’s shared this project — see you next week!

Practical Web Perf

Discussion about this post

Ready for more?