JavaScript, a cornerstone of web applications, relies on efficient runtime compilation for optimal performance. The V8 engine, powering many browsers, traditionally uses a one-size-fits-all approach to manage its four compilation tiers: Ignition, Sparkplug, Maglev, and Turbofan. However, this legacy strategy often falls short in maximizing performance across diverse JavaScript functions. In collaboration with Google’s V8 team, Intel has developed Profile-Guided Tiering, a groundbreaking strategy that tailors the tiering process to the specific needs of each function. This strategy enhances the V8 engine's efficiency, leading to faster, more responsive web applications and a smoother user experience.
The proof is in the execution. We’ve observed that Profile-Guided Tiering can improve Speedometer 3 (the industry's de facto Web performance benchmark) by ~5% on Intel® Core™ Ultra Series 2 processors (Lunar Lake), and Google recognized it as one of three key items that helped Chrome browser achieve the highest score on Speedometer 3 (chromium blog).
What is Tiering?
V8 2023 Pipeline - Source: Google Design Documentation
Given that JavaScript functions are compiled at runtime, there can often be a tradeoff between compilation time and code quality. Currently, V8 has four compilation tiers; Ignition, Sparkplug, Maglev, and Turbofan. Javascript functions enter the first tier on startup and will go through the latter tiers upon invocations. As shown in the figure above, from left tier to right tier, the compilation time becomes longer and longer, while execution speed of compiled code becomes faster and faster. The four compilation tiers provide both a short startup time and a high peak performance for JavaScript code. The heuristic of changing between different tiers is called tiering.
Tiering Up
In the figure above, changing from left tiers to right tiers is called tiering up. Tiering up is triggered for a JavaScript function when the function’s invocation count reaches a threshold. For example, a Javascript function is first compiled by Ignition on startup. When a function is invoked eight times, it will be tiered up by default from Ignition to Sparkplug.
Tiering up from Sparkplug to Maglev and from Maglev to Turbofan requires feedback collection while tiering up from Ignition to Sparkplug doesn’t. As JavaScript is a dynamically typed language, it uses feedback (such as objects’ types) collected at runtime to generate better code. This is called speculative optimization.
The entire sequence of tiering up from the first tier to the last tier can be simplified as:
Tiering | Invocation count required | Feedback collection required |
Ignition -> Sparkplug | 8 | No |
Sparkplug -> Maglev | 500 | Yes |
Maglev -> Turbofan | 6000 | Yes |
Tiering up needs runtime compilation, which is expensive, because:
- Even though the main compilation jobs are concurrently executed in the background thread, some jobs need to be done in the main thread, which blocks execution.
- The concurrent compilation jobs in the background thread and the execution in the main thread may concurrently access the same data, which creates a data race.
- Both compilation and compiled code consume memory.
Tiering up is only worthwhile when the new tier code is sufficiently utilized.
Tiering Down (Deoptimization)
Optimized tiers in the V8 engine rely on predictions made from collected feedback. When these predictions fail due to new feedback, the engine must switch from an optimized tier to an unoptimized one, a process known as tiering down or deoptimization.
Deoptimization is costly because it involves:
- Transitioning from fast, optimized code execution to slower, unoptimized execution before potentially optimizing again.
- Performing on-stack replacement and context switching.
To minimize these costs, it's crucial to avoid optimizing functions that are likely to be deoptimized soon.
Profile-Guided Tiering
Given that both tiering up and tiering down are expensive, the heuristic to decide when to tier up is critical to performance. Tiering up too early can increase the chance of tiering down (deoptimization) and waste CPU resources while tiering up too late will delay the benefits of optimized code.
Legacy tiering decisions were made according to the functions’ runtime invocation count. However, using profiling data, we can predict whether a function is hot or whether a function will be tiered down soon. With this information, we invented the new Profile-Guided Tiering strategy for V8. During the first run (profiling), we analyze every tiering up and tiering down and cache the decisions. In subsequent runs, we can accurately adjust the tiering according to the cached decisions.
Profile-Guided Tiering delivers better performance because:
- Cross context: The experience in previous runs can be saved for improving the performance of later runs.
- Case-by-case: We can assign different tiering strategies for different functions according to the profiling.
According to feedback from Google, this not only improves benchmark scores, like Speedometer 3, but also improves the user’s experience in real-world website navigation. Currently, Profile-Guided Tiering includes policies for Early Tiering Up and Delay Tiering Up.
Early Tiering Up
The early-tiering-up policies for unoptimized tiers and optimized tiers are different because unoptimized tiers are less likely to be tiered down because unoptimized tiers are not speculative compilation.
Early Tiering Up To Unoptimized Tier (Sparkplug)
Legacy ignition functions for the V8 engine tier up to Sparkplug after eight invocations. In our profiling runs, we recorded the functions that tiered up to Sparkplug because the functions will be highly likely tiered up to Sparkplug in the subsequent runs. In the subsequent runs, we tiered up the same functions on the first invocation, which improved Speedometer 2 scores by +2%
Profile-Guided early tiering up to unoptimized tier (Sparkplug)
Early Tiering Up To Optimized Tiers (Maglev Or Turbofan)
Functions tier up to Maglev and Turbofan only when the collected feedback becomes stable . For example, tiering from Sparkplug to Maglev requires 500 invocations. During the invocations, if the feedback changes, the counter for the invocations will be reset to zero, and then tiering up needs another 500 invocations. The number of invocations to tier up to Turbofan is 6000.
This makes the strategy for early tiering more complex. Our design is based on the assumption that if the collected feedback of a function becomes stable within a few invocations and there is no subsequent deoptimization, it's highly likely that the function uses stable types of objects. We can record such functions in the profiling runs and tier up these functions early in the subsequent runs. Specifically, if a function had been compiled by Maglev but not by Turbofan, we will tier up this function to Maglev earlier in the subsequent runs.
Profile-Guided early tiering up to Maglev
If a function had been compiled by both Maglev and Turbofan, we will tier up this function earlier and directly to Turbofan in the subsequent runs.
Profile-Guided early tiering up to Turbofan
In addition, we also tuned the strategy for some situations like inlined functions and context-dependent optimizations. In total, the heuristic contributes a +2.4% improvement to Speedometer 3’s score.
Delay Tiering Up
If a function is deoptimized soon after being tiered up to Maglev, we can assume that its feedback is unstable. This means that the functions need more invocations before tiering up. We record such functions in the profiling runs and assign delayed tiering strategies to them in the subsequent runs. This can avoid the deoptimizations and thus improve the performance.
Profile-Guided delay tiering up
Performance Estimation
We landed the Profile-Guided Tiering in a series of patches and estimated their performance impact separately. In total, they contribute a 5.2% improvement to Speedometer 3’s score on the Intel Core Ultra Series 2 processor (Lunar Lake). The details are shown in the table below.
Optimization | Improvement to Speedometer 3 |
Early tiering to Sparkplug | +2% |
Early tiering to Maglev/Turbofan | +2.4% |
Delay tiering to Maglev | +0.8% |
Total | +5.2% |
Future Work
Intel’s Web Platform team is working on introducing more Profile-Guided Tiering optimizations to V8. Our current work saves the profiling information to a disk cache (in addition to the current in-memory cache) to support inter-process Profile-Guided Tiering. We’re collaborating closely with the Google V8 team, which has reviewed and accepted several Profile-Guided Tiering optimizations due to the fact that the optimizations are simple, clean, and provide significant performance improvements.
As we continue to refine Profile-Guided Tiering in the V8 engine, we invite developers to explore how these optimization techniques can be applied to other script programming languages, such as Python, which also utilize multiple runtime compilation tiers. Your insights and collaborative efforts are invaluable in expanding the reach and effectiveness of these strategies. We welcome your feedback and ideas on extending this work, and encourage you to share your thoughts and experiences by reaching out to us at open.ecosystem.communications@intel.com.
About the Author
Tao Pan, Software Engineer, Intel
Tao Pan has been optimizing web platforms for Intel since 2018. Prior to Intel, Tao spent 18 years as an embedded system software developer (Linux, Android, RTOS, firmware, device driver, etc.). You can find him on GitHub.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.