If you are interested in the details about the tool, please refer to this design document.
Typically, loads and stores are the most frequent and expensive operations in a workload. Up to 40% of the instructions in a workload carrying load or store intent are common. Therefore, to obtain better performance, V8 implements a very complex mechanism to manage memory.
Learning this mechanism totally from source code is difficult, let alone optimize it. If there are some tools that can visualize the layout of V8 heap, developers can understand the mechanism more intuitively.
Why We Decided to Develop This Tool
We had spent some time analyzing V8 under the Intel platform, and we found the TLB hit rate was not good. To solve this problem, our colleague contributed a patch. This patch enabled THP (transparent huge page) and achieved significant improvement for Node.js. Unfortunately, it has a large impact on V8 GC (garbage collection) mechanism and increases memory footprint, and it doesn't work well with pointer compression(default enabled in Chromium). So, the V8 team refused to enable this patch for clients (Chromium). If we want to enable THP for Chromium, we need a more sophisticated implementation. And it won’t hurt the current GC mechanism and it should be compatible with pointer compression.
Before we start optimizing this, we must understand how it works currently. As the saying goes: “There are no secrets under the source code.” Reading the source code is always a good method to understand the program totally. However, if we have something that can help us to have an intuitive understanding of it, it will be easier and more efficient to dive into the source code.
V8 already has some fantastic tools. But there is not one tool to help developers to understand the heap layout., We decided to develop one, and contribute it to the community. Through almost a year’s designing, developing, polishing, and communicating with V8 team, finally, the tool landed in the official V8 repository successfully
Evolution of The Tool
The tool generally consists of backend and frontend. The backend records layout of heap and output log files. Usually, the recording is driven by two methods: sampling-based and event based. The sampling-based method records at regular intervals. And the event-based method records when specific events are triggered. We use the event-based method, because the V8 heap will grow up linearly in general. The layout will be greatly changed after GC because dead(inaccessible) objects can be anywhere, after they are freed, there will leave many holes in the heap. We believe it is worthwhile to focus on this, which may have optimization chances there.
The frontend is the visualizer that visualizes the log file generated by the backend. We had spent numerous times designing, implementing, and updating it. The early version is developed on matplotlib of Python, it’s a popular visualization library. We parse the raw log files and generate corresponding images to visualize them. However, this version is not user-friendly. The images are static png and svg format files, which are not interactive. And the deployment process is complicated. Users need to generate those images files and some other statistic files, then upload all of them to the web folder.
After investigating the existing V8 tools, and listening to feedback from colleagues, we decided to overthrow and start over. We finally simplified the process and made it consistent with the existing V8 tools. Users need to drag the raw trace files to the browser page. The frontend will parse and visualize it automatically. The generated charts are totally dynamic and interactive. We believe that the landed version has a good user experience.
How We Use This Tool
PoC of THP For Chromium
As said in the previous chapter, we are working on enabling the THP in Chromium and Chrome OS. With the help of the tool, we understand the heap layout and the GC impact to the layout, we set the strategy.
We spent months implementing the early version of THP and finished the PoC. The test results showed about 1.5% improvement on SpeedoMeter2, a popular industry benchmark for browsers, under the Intel processor. But meanwhile, the patch also introduces more memory fragmentation, we need to analyze the fragmentation distribution to decide how to solve this problem. We used the tool to observe the fragmentation in huge page:
We found that after major-gc(mark-compact), much fragmentation appeared. So, the key point is designing a new compaction strategy for a huge page. Now we have already discussed some potential solutions, and we are actively communicating with the V8 team.
Abnormal GC frequency
The tool also helped us to locate anomalies. Few months ago, one colleague from our team was developing a new feature for V8 based on the Intel platform. In the development phase, the patch can only work on the emulator. During the test, our colleague found the GC frequency increased significantly running on the emulator. Because we couldn’t know every detail about the GC mechanism in V8, it was difficult to locate the anomaly.
Then with the help of the tool, we found that the new space size is smaller running on the emulator. Following this clue, we looked at the growth-shrink strategy of new space. We found that the dynamic size of new space is relative to the allocation throughput. The emulator is much slower than the real computer and so is the allocation throughput. With a low throughput, the size of the new space hardly grows. The minor-gc will act when the new space is full. Then, the smaller new space will cause more frequent minor-gc. To verify this, we fixed the size of the new space and tested again. Finally, we got consistent results.
Looking to The Future
We will continue extending this tool. Now we have an idea to combine the visualization of layout with the hotspot of memory access. We expect to find some pattern of memory access in a specific layout. If we find that, we can try to gather those frequently accessed objects together and make them close to each other in layout. Then we can get better dCache hit rate and improve the performance.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.