Client
Interact with Intel® product support specialists on client concerns and recommendations
32 Discussions

A Journey for Landing The V8 Heap Layout Visualization Tool

Jianxiao_Lu
Employee
2 0 3,613

JavaScript is one of the most popular programming languages today. It has grown rapidly over the past few decades and has become the most important part of web applications. JavaScript itself is a standard and has many implementations. From them, V8 is the most popular, which was embedded in Chromium and Node.js.

We come from the JavaScript Optimization Team under the Web Optimization Team, we focus on the optimization of JavaScript, mainly V8, for the Intel platform. This blog will tell the story about our efforts for landing the V8 Heap Layout Visualization Tool. We believe that this tool can help developers understand the memory management mechanism of V8 and optimizations for it.

If you are interested in the details about the tool, please refer to this design document.

Background

Typically, loads and stores are the most frequent and expensive operations in a workload. Up to 40% of the instructions in a workload carrying load or store intent are common. Therefore, to obtain better performance, V8 implements a very complex mechanism to manage memory.

Learning this mechanism totally from source code is difficult, let alone optimize it. If there are some tools that can visualize the layout of V8 heap, developers can understand the mechanism more intuitively.

Why We Decided to Develop This Tool

Our team focuses on optimizing the JavaScript engine for Web applications and optimizing the performance of memory access is one of the most important tasks for us. Memory access is vital for performance of almost all programs. To reduce the memory access time, modern processors use components called cache and TLB  (translation lookaside buffer). If we can increase the cache and TLB hit rate, the memory access time will be reduced significantly.

We had spent some time analyzing V8 under the Intel platform, and we found the TLB hit rate was not good. To solve this problem, our colleague contributed a patch. This patch enabled THP (transparent huge page) and achieved significant improvement for Node.js. Unfortunately, it has a large impact on V8 GC (garbage collection) mechanism and increases memory footprint, and it doesn't work well with pointer compression(default enabled in Chromium). So, the V8 team refused to enable this patch for clients (Chromium). If we want to enable THP for Chromium, we need a more sophisticated implementation. And it won’t hurt the current GC mechanism and it should be compatible with pointer compression.

Before we start optimizing this, we must understand how it works currently. As the saying goes: “There are no secrets under the source code.” Reading the source code is always a good method to understand the program totally. However, if we have something that can help us to have an intuitive understanding of it, it will be easier and more efficient to dive into the source code.

V8 already has some fantastic tools. But there is not one tool to help developers to understand the heap layout., We decided to develop one, and contribute it to the community. Through almost a year’s designing, developing, polishing, and communicating with V8 team, finally, the tool landed in the official V8 repository successfully

Evolution of The Tool

The tool generally consists of backend and frontend. The backend records layout of heap and output log files. Usually, the recording is driven by two methods: sampling-based and event based. The sampling-based method records at regular intervals. And the event-based method records when specific events are triggered. We use the event-based method, because the V8 heap will grow up linearly in general. The layout will be greatly changed after GC because dead(inaccessible) objects can be anywhere, after they are freed, there will leave many holes in the heap. We believe it is worthwhile to focus on this, which may have optimization chances there.

The frontend is the visualizer that visualizes the log file generated by the backend. We had spent numerous times designing, implementing, and updating it. The early version is developed on matplotlib of Python, it’s a popular visualization library. We parse the raw log files and generate corresponding images to visualize them. However, this version is not user-friendly. The images are static png and svg format files, which are not interactive. And the deployment process is complicated. Users need to generate those images files and some other statistic files, then upload all of them to the web folder.

Jianxiao_Lu_0-1647326395948.png

After investigating the existing V8 tools, and listening to feedback from colleagues, we decided to overthrow and start over. We finally simplified the process and made it consistent with the existing V8 tools. Users need to drag the raw trace files to the browser page. The frontend will parse and visualize it automatically. The generated charts are totally dynamic and interactive. We believe that the landed version has a good user experience.

Jianxiao_Lu_1-1647326395971.png

How We Use This Tool

PoC of THP For Chromium

As said in the previous chapter, we are working on enabling the THP in Chromium and Chrome OS. With the help of the tool, we understand the heap layout and the GC impact to the layout, we set the strategy.

We spent months implementing the early version of THP and finished the PoC. The test results showed about 1.5% improvement on SpeedoMeter2, a popular industry benchmark for browsers, under the Intel processor. But meanwhile, the patch also introduces more memory fragmentation, we need to analyze the fragmentation distribution to decide how to solve this problem. We used the tool to observe the fragmentation in huge page:

Jianxiao_Lu_2-1647326395978.png

Jianxiao_Lu_3-1647326395982.png

 

We found that after major-gc(mark-compact), much fragmentation appeared. So, the key point is designing a new compaction strategy for a huge page. Now we have already discussed some potential solutions, and we are actively communicating with the V8 team.

Abnormal GC frequency

The tool also helped us to locate anomalies. Few months ago, one colleague from our team was developing a new feature for V8 based on the Intel platform. In the development phase, the patch can only work on the emulator. During the test, our colleague found the GC frequency increased significantly running on the emulator. Because we couldn’t know every detail about the GC mechanism in V8, it was difficult to locate the anomaly.

Then with the help of the tool, we found that the new space size is smaller running on the emulator. Following this clue, we looked at the growth-shrink strategy of new space. We found that the dynamic size of new space is relative to the allocation throughput. The emulator is much slower than the real computer and so is the allocation throughput. With a low throughput, the size of the new space hardly grows. The minor-gc will act when the new space is full. Then, the smaller new space will cause more frequent minor-gc. To verify this, we fixed the size of the new space and tested again. Finally, we got consistent results. 

Looking to The Future

We will continue extending this tool. Now we have an idea to combine the visualization of layout with the hotspot of memory access. We expect to find some pattern of memory access in a specific layout. If we find that, we can try to gather those frequently accessed objects together and make them close to each other in layout. Then we can get better dCache hit rate and improve the performance.