Performance Measurement

Whether it is the open source community or commercial tools, there are a lot of means and tools can help us measure performance, here I list a few of my own use, high accuracy and simple access to the tool.

  1. Cloudflare Web Analysis : This supports both hosted on Cloudflare website access, also supports javascript access. The access method is simple, and the data dimensions are comprehensive. It also supports querying via the GraphQL API, making it easy to access your own dashboards to filter more dimensions of data.
  2. Sentry: Compared to Cloudflare’s generic performance data, Sentry includes some framework-level details. For example, Vue, React rendering details, route matching time, resource loading time, interface request time, which is more suitable for developers to debug the causes of performance bugs.
  3. Clarity: Compared to the performance data of Core Web Vitals, Clarity focuses more on the performance data of user interactions and behaviors. It records and plays back abnormal user behavior. For example, Rage Click, Dead Click, Excessive scrolling, Quick backs, etc.
  4. Google Search Console: This is a tool that all webmasters will use, it gives Google SEO related data, as Google will make Core Web Vitals incorporate all the data from Google’s website. Vitals is included in the search results of the reference category, so the GSC panel has relevant data. However, for mainland Chinese websites, this report is of limited use, as the GSC data comes mainly from CrUX (Chrome User Experience Report), which is designed for eligible users from the browser side.

The eligible users conditions are as follows.

  1. Enable Usage Statistics Reporting.
  2. Synchronize Browser History.
  3. No Synchronization Password is set.
  4. Using a supported platform.

This means that most of the users in mainland China can’t report properly, so the performance report of GSC has limited reference in China.

Key Metrics

There are many articles and contents about key indicators, so I will not repeat them here, but simply list the indicators here. (For more details, please refer to Web Vitals)

  • LCP — Measures loading performance. Maximum Content Drawing Timing, which starts when Navigation occurs.
  • INP — Measures interactivity. Interaction occurs, up to the next drawing Timing, starting when Interaction occurs.
  • CLS — Measures visual stability. CLS represents the cumulative value of page layout changes

The above metrics are Core Web Vitals and are reflected in Google Tools. There are other metrics that are also part of Web Vitals, but are not part of Core Vitals and are more of a detail that developers need to care about.

  • FCP — Measures loading performance. Timing of the first drawing of a page, starting from when the Navigation occurs.
  • TTFB — Measures loading performance. The time of the first byte of the Response returned by the server, starting when Navigation occurs.

Specially, there is some controversy about the definition of TTFB after the introduction of Early Hints.

  • TBT — Measuring Interactivity. The total amount of time the main thread is blocked long enough after an FCP occurs to prevent an input response, i.e., the sum of the timeout portion of each long task’s timing. However, since TBT is a laboratory property, it is not a real user’s physical response.

In order to visualize the meaning of each metric, I will give a picture of when each metric occurs. It is important to note that the execution details in the picture are for reference only. (All scripts in the image below refer to <script type="module" /> )

Web Application Progress
INP From web.dev

How to Optimize Performance

The idea of performance optimization is to reduce the timing of each stage of the access chain. I’m going to talk about optimization in terms of two common modes of running a modern web page, and I’m going to discuss them in terms of two common web frameworks: Vue and React.

Optimization Ideas for CSR (Client-side Rendering)

CSR in the production environment, often by the Nginx Web Server routing control, all the path of the request will return the same index.html, the specific path and page display logic by the javascript front-end implementation. The specific logic of CSR operation, this article will not repeat.

Next, we will start from the Web Vitals of the various indicators, to provide some optimization ideas.

CSR Progress
CSR Progress

CSR’s TTFB Optimization

From the running mode of CSR, it is easy to know that CSR application tends to have better TTFB performance. Because in CSR, Server only does simple file operation, and index.html often only contains indexes of js and css, so the Response Size is also smaller.

Therefore, in CSR the optimization of TTFB metrics may not have a high ROI.

CSR’s FCP Optimization

In the default configuration of CSR, index.html does not contain any visual code. i.e. a white screen, which is fatal to the user's loading experience, as the user is not able to easily determine whether he has received the page or not, and may hit the browser's Refresh button again.

In order to improve the loading experience for the user and get better FCP performance in CSR, we can consider rendering some branded Landing styles directly in index.html, or animating the page while it is loading, as in the x.com implementation.

x.com first screen

CSR’s LCP Optimization

In CSR’s Real-World applications, LCP is a tough nut to crack. Because LCP has a long chain in CSR, in addition to the necessary loading of js and css, it often includes the response time of the API Server and the execution time of JS.

Therefore, we can consider some front-end solutions, the following provides some reference.

  • For static path pages, we can consider SSG (Static Site Generation) solution, for example, https://desktop.telegram.org/ page is very suitable for SSG solution.

1. consider SSR frameworks such as Nuxt.js, Next.js, etc., which support SSG.

2. consider using a library such as Puppeteer, which saves a snapshot of the corresponding page’s HTML as index.html at build time.

  • For Dynamic Path pages, we need to discuss this in more detail.
  • LCP content is dependent on the response of the API Server, for example, the LCP for the page https://x.com/elonmusk is the background image set by the user, which can only be optimized by reducing the cost of rendering the page.
  • LCP content is relatively fixed content, such as a brand logo or a fixed portion of HTML code. For this type of page, we can refer to the optimization of FCP, and output this part of the content directly into the HTML.

Once we want to output different HTML for different Paths, we need to think about how to convert a SPA project into an MPA project, which we have a lot of experience with in the community, so I won’t go into that here.

So what are the ideas we can have if we just want to really optimize the LCP ?

Reduce the js before LCP rendering

  1. The Big Entry e.g.main.js, the entry point for all SPA code, global references to some plugins may cause main.js to be too big.
  2. Lazy Loading Routes, through the dynamic introduction of routing components, can reduce the loading of other pages’ js.
    - Optimization points: For javascript on other pages, you can use Resource prefetching](https://developer.mozilla.org/en-US/docs/Glossary/Prefetch#resource_ prefetching for other pages’ javascripts.
    - webpack: import(/* webpackPrefetch: true */ '. /navbar.tsx') // prefetch.
    - React: can be implemented via React.lazy, see The ultimate guide to React Lazy Loading or use react-loadable for more functionality. 4. Vue: is the first of its kind.
    - Vue: can be implemented with vue-router-prefetch.
  3. Dynamic loading
    - For *.js that are not in the critical rendering path, such as content areas that are not visible on the first screen, change to dynamic loading.
    - Dynamic loading based on Event, e.g. Click/Hover to trigger. 4. Code Splitting
  4. Code Splitting

Optimize the API interface that LCP relies on

  1. Trimming the API Response Size to contain only the necessary information would be beneficial
  2. Change serial request to parallel request, which can effectively reduce the waterfall of requests

Reduce the time it takes to pull static resources like js, css, etc

  1. Hosting resources on the right CDN can greatly reduce Response Time.
  2. Enable Brotli/Gzip compression, by reducing the Response Size, you can also greatly reduce the Response Time.
  3. Using HTTP2, especially when there are a lot of requests for the same domain, an HTTP/1.1 request of about 400 bytes may be compressed into an HTTP/2 request of about 10 bytes.
  4. Marking asynchronous js, css as preload in HTML <head>, e.g. preload-webpack-plugin will effectively reduce the waterfall flow of Resource Fetch waterfall (by default all other js/css will be pulled after main.js is executed).

Optimize the loading of image resources: In many cases, the content of the LCP is images, so the loading of image resources is especially critical.

  1. Use responsive images, for different screen sizes to provide different sizes and clear images
  2. Use modern image formats such as: Webp
  3. Use some image optimization products, such as Cloudflare Polish

CSR’s TBT Optimization

In React 18, React introduces a new Concurrent Renderer that allows React to return to the main thread every 5ms through interruptible updates to ensure page interactivity.

In Vue, the basic overhead of virtual DOM tuning is solved by analyzing templates. Intelligent component tree-level optimization is achieved through a series of optimizations, which means that the same update that might cause multiple components to re-render in a React application is likely to cause only one component to re-render in Vue. As a result, updates in Vue are inherently more lightweight. Why remove time slicing from vue3?

By the way, virtual lists are a good option when rendering large lists.

But no matter how you optimize, the most direct and effective way is always to do less work.

Just remember that ‘scheduling’ doesn’t let you break the laws of physics. The best way to get better performance: do less work. FROM Rich Harris’ X

Optimization Ideas for SSR (Server-side Rendering)

In SSR Web Application, since the html returned by the server contains content, the optimization idea and direction will be different compared to CSR.

SSR Progress

From the above figure, we can find that in SSR, TTFB, FCP and LCP tend to occur one after another, and the timing in between is often only a short time, which is totally different from CSR.

This also means that once TTFB is improved, FCP and LCP will be improved as well.

SSR’s TTFB Optimization

The optimization of TTFB can be viewed from several perspectives

  1. Server-side return as soon as possible
    - Reduce server-side computation (do less work)
    - Use full-text caching
    - Streaming rendering
  2. Reduce network time
    - Move the server forward, e.g. CF worker
    - Reduce response size, e.g. Brotli/Gzip as mentioned above.
    - CDN (Content Delivery Network)

In my experience, the most direct and effective way to optimize TTFB is through CDN, which can easily achieve the following goals:

  1. Reduce network time consumption, the request can be returned at the nearest CDN node without going through layers of forwarding to the source. 2.
  2. Reduce the return time of the service side, CDN is equivalent to a layer of Cache, a large number of requests will be in the hit cache directly back to the service side of the pressure becomes smaller. 3.
  3. For business scenarios with large changes in traffic, such as various sudden traffic, CDN can easily cope. On the contrary, no matter how to optimize the server-side computing, in the QPS higher scenarios will eventually bring a certain degree of delay.

In practice, if you can’t use a CDN because the server-side calculation depends on the client’s state, such as the user’s login state, Personalized recommendation algorithms, etc. you should modify your business logic to make it more efficient.

The business logic should be modified to allow the server to output stateless content as much as possible, while the state-related logic is calculated on the client side. In this way, the business has the ability to be on the CDN (in fact, the ability to be on the cache).

Fortunately, the isomorphic code of both NextJs and NuxtJs allows us to migrate the computation logic from the server side to the client side without much cost, which is the advantage of modern SSRs. This is the advantage of modern SSRs (compared to traditional SSR models such as PHP, JSP, etc.)

But let’s try to keep the discussion simple here, what can we do to optimize TTFB without considering CDN and full-text caching ?

  1. Reduce unnecessary or over-expensive computations (do less work)
  2. Avoid serial requests on the server side, reduce backend dependency, which is very common when CSR projects are transformed into SSR projects. 3. Rationally cut the rendering logic, reduce server-side computation.
  3. Reasonable cut rendering logic, reduce server-side computation. Render different content for different screen sizes.
    - Utilize Sec-CH-Viewport-Width, Sec-CH-Viewport- Height request headers to determine how to render.
    - Use UA to determine the device type, decide how to render.
  4. Reduce HTML size: Content that is not SEO valuable and not visible on the first screen is migrated to client-side rendering. 5. Partial caching: Content that is not visible on the first screen is migrated to client-side rendering.
  5. Partial caching The <head> portion of the HTML in the response, utilizing HTTP's chunking, prioritizes the <head> portion of the HTML returned. Returns the <head> portion of the HTML, which is less demanding in real-time and improves the FCP metrics for a better user experience.
  6. The Early Hints.

In short, anything that can be done to reduce Response Time, can be done to reduce TTFB.

💡 By The Way, For frameworks such as nuxtjs and nextjs that use () => import('pages/PageA') in their routes, it is important to be aware of the server-side overhead of dynamically importing the corresponding path.

In my experience, the Response Time Delay for a path can become very high when traffic is increasing, and the reason for this is that it takes a long time to resolve the import of the corresponding path when it is first accessed. Especially when multiple requests arrive at the server at the same time, they all enter the import resolve delay at the same time.

SSR’s FCP Optimization

In SSR applications, the time interval between FCP and LCP is usually very short. In practice, the FCP is controlled by the timing of the TTFB and the order in which the HTML is written. Therefore, to optimize TTFB, we can start from the following perspectives.

  1. Optimize TTFB
  2. Reduce pending rendering
    - Only inline the key css of the first screen to the html, and put it in <head>. (The reason to put it in <head> is to avoid the page jitter caused by css loading, which affects CLS) In fact, in many frameworks, a lot of css will be introduced into <head> by link, which will cause delayed rendering.
    - Put js into html in a way that doesn’t block rendering, e.g. <script type="module|async|defer" /> or put <script> tags after <bodu>. 3. write js in the right order.
    - Write the <html> in the correct order, with the page content in the same order as the HTML elements.
  3. HTTP 2.0 , use HTTP 2.0 where permitted for static resources.
  4. Early pulling of static resources through new HTTP standards such as Early Hints.

SSR’s LCP Optimization

In SSR applications, if the FCP is good, then the LCP metrics will tend to be better. When the LCP element is an image, LCP ≈ FCP + Image Download Timing.

So we can refer to CSR’s optimization methods to optimize the loading of image resources.

  1. Use responsive images, provide different sizes and clear images for different screen sizes.
  2. Use modern image formats such as: Webp
  3. use some image optimization products, such as Cloudflare Polish

SSR’s FID/INP Optimization

In frameworks such as Nextjs/Nuxtjs, there is a small period of time after the server-rendered page reaches the browser before it is ready to be interacted with, and this period of time corresponds to the action Hydration, which is the js event hooked to the dom.

To reduce user input delays, we can consider implementing some button interactions outside the framework or using native HTML tags to achieve certain functions.

For example, a simple login interaction can be achieved through <dialog> and <a> without the need for framework participation.

Summary

This concludes the content of this article. This article has sorted out the optimization methods for two different forms of web applications and listed some common indicators and analysis tools. If you are also engaged in related work, I hope this content can bring you some help.

--

--