Graphics Frame Analyzer
Bar Chart Tips & Tricks
Authors
Philipp Gerasimov, GPU Software Architect, Intel® GPA Product Owner, Intel Corporation
Pamela Harrison, Software Technical Consulting Engineer
Abstract
The Bar Chart is one of the major elements of the Graphics Frame Analyzer GUI. It is a powerful tool for understanding the rendering frame performance as a whole, finding major bottlenecks, concentrate on a specific part of the frame and also finding specific problems which might be interesting for further optimization investigation.
The bar chart displays a set of bars from
- either all individual 3D Rendering and Compute calls which are executed on the GPU, like Clear(), DrawIndexed(), Dispatch(), DispatchRays(),
- or the group of calls combined by Rendering Passes, Render Targets, etc.
The X and Y axes can be set to many available metrics via the dropdown menus, offering visualization of several different aspects of a frame’s performance. More about axis metrics choices later.
The secondary Zoom Bar Chart underneath the main Bar Chart allows you to quickly select a part of the frame to focus on.
X-Axis and Y-Axis Selections
By default, as you can see in Image 1, the bar chart has only one axis selected with the Duration metric, displaying the execution duration on the GPU. This allows us to quickly find the biggest (or smallest) calls/passes in the frame - the parts of the frame which take the most (or the least) duration by looking at the heights of the bars. This works well in frames with a small number of rendering calls. But if you have thousands of draw calls it will become hard to read the chart as each bar becomes very narrow and you would need to select part of the frame to zoom into to get a better understanding.
Note: The numbers on the X-axis indicate the number of the API call, so that the width of the entire chart contains one bar for each API call. |
There is a useful bar chart trick which works with any frame with any number of draw calls to better visualize information. Setting both axes to display Duration, makes the bars in the bar chart 2-dimensional. Meaning, GPU Duration X GPU Duration will display each call as an area, making it very easy to pick out the calls that take the longest, as they are much bigger in both directions than calls taking less time in the GPU.
In this example with Intel® GPA’s Microsoft* DirectX 11 (DX11) sample, gpasample.exe, the difference between the default selection (Image 1) and the duration by duration selection (Image 2) is clearly visible. Note in particular, the visualization of draw call 95, which takes a significant portion of the frame. Compare the large area of call 95 in Image 2 to the representation of call 95 in Image 1, where the importance of this call is less clear due to the many very small GUI rendering calls displayed with the same narrow sized bars, making it appear that several of the calls contribute to significant frame time. Image 1 might lead you to optimize calls 91, 92 and 94 with equal developer time and energy as you spend optimizing call 95. Whereas Image 2 accurately leads you to focus more heavily on call 95.
This difference becomes even more apparent when looking at a very complex frame, like this frame from an Unreal Engine-based (UE-based) application:
In the above image of the UE sample, you may pick out 5-10 calls that take significant frame time, leading you to spend time on code that is less critical to optimization. But below, in Image 4, with both axes set to duration, it is clear that calls 4719 and 7517 are the ones that, by far, take more time than any of the other calls in this frame, and therefore deserve more developer attention for optimization.
Selecting Chart Regions: Microsoft PIX Markers
The bar chart allows us to visualize Microsoft PIX markers if the application is instrumented with the Microsoft PIX API. This is a very useful way to identify rendering passes: Shadows, G-Buffer, SSAO, and so on (if these passes are instrumented by the engineers who wrote the code). This feature is accessible from the Chart Regions drop-down menu as Debug Regions:
This lets us clearly see performance for different parts of the application’s scene or workload, with no guessing. Just the facts. This is even more important for very complex frames with thousands of draw calls.
Usually, engineers enable and disable PIX markers during development and, in most cases disable them for release builds. So, it’s good to check if markers are enabled before capturing frames with Intel® GPA. In the picture below you can see PIX markers written on the top of the group of calls.
Group By: Individual Calls vs. Groups
Groups help us to better visualize performance for groups of draw calls, such as various rendering passes. The following groups are available for selection:
The Draw Calls grouping (Group by: Draw Calls) sets the bar chart to the individual draw call list. As an example, looking at gpasample.exe we can look at the Stairway scene – color only pass with the list of individual draw calls, in Image 8. Or we view the region as a group of draw calls when we choose Group by: Debug Regions, as in Image 9.
It is important to note that changing the Group By selection is not only a change in visual representation. Group selection changes the way performance metrics are collected. In the case of grouping by draw calls, metrics are measured for each draw call separately. So that,
- When grouping by Draw Calls, if a set of draw calls are selected, the metrics are displayed as the sum of the individual calls;
- Whereas if grouping by region (for example Debug Regions) the metrics are measured for each whole group, not each individual call.
This makes a difference when measuring very small draw calls. It allows us have more precise values for the whole group, especially when there are many very small calls, as measuring precision depends on the GPU execution time and introduces a variation in cases when the execution time is very small.
Using Various Metrics on the Y-Axis
Sometimes, it is interesting to quickly find calls with particular characteristics, like calls with the most geometry or the highest number of rendered pixels. It is easy to find if you set your characteristic of interest as the Y-axis label of the bar chart.
Setting the y-axis to Primitive Count or Vertex Count, for example, provides a quick look at geometry complexity across the scene:
Setting it to Rasterized Pixels allows us to find fill-rate heavy calls:
Setting it to Sample Bottleneck allows us to find calls which are limited by Texture Sampler:
Advanced Profiling Mode
At the top-left corner of the bar chart you can find the flame icon which enables Advanced Profiling Mode. This enables calculation of metrics in a way that allows us to automatically detect GPU bottlenecks, grouping by bottleneck type. In this example from gpasample.exe the Advanced Profiling Mode shows the hottest bottlenecks in the scene from left to right. Examining each group leads us to understanding which calls are affected that might give ideas for optimization gains.
Note that the biggest bottleneck may not include the most time consuming call. Instead, it might consist of a set of less time consuming calls that, when taken together, add up to a large part of the frame time. Thus, optimizing for the worst bottleneck rather than optimizing for the most time consuming call, is likely to optimize several calls, in essence increasing developer time (optimization time) value.
Summary
These are just few examples of the Bar Chart flexibility which can reduce the time of finding and analyzing performance bottlenecks of your applications. Tell us about your best Bar Chart practices and tips & tricks.
Resources for Intel® GPA
- Intel GPA Overview
- Download (free)
- Training – Use Cases and Quick Tips
- User Guide
- Video: Open and Explore a Single Frame
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.