Tools
Explore new features and tools within Intel® products, communities, and platforms
79 Discussions

Graphics Frame Analyzer Bar Chart Tips & Tricks

Pamela_H_Intel
Moderator
0 0 1,664

Graphics Frame Analyzer

Bar Chart Tips & Tricks

Authors

Philipp Gerasimov, GPU Software Architect, Intel® GPA Product Owner, Intel Corporation

Pamela Harrison, Software Technical Consulting Engineer

Abstract

The Bar Chart is one of the major elements of the Graphics Frame Analyzer GUI. It is a powerful tool for understanding the rendering frame performance as a whole, finding major bottlenecks, concentrate on a specific part of the frame and also finding specific problems which might be interesting for further optimization investigation.

The bar chart displays a set of bars from

  • either all individual 3D Rendering and Compute calls which are executed on the GPU, like Clear(), DrawIndexed(), Dispatch(), DispatchRays(),
  • or the group of calls combined by Rendering Passes, Render Targets, etc.  

The X and Y axes can be set to many available metrics via the dropdown menus, offering visualization of several different aspects of a frame’s performance. More about axis metrics choices later.

The secondary Zoom Bar Chart underneath the main Bar Chart allows you to quickly select a part of the frame to focus on.

X-Axis and Y-Axis Selections

Image 1: In this image, we show the default selections: GPU Duration for the Y-axis, the height of each bar indicates the duration of that call, and for the X-axis no metric is chosen.Image 1: In this image, we show the default selections: GPU Duration for the Y-axis, the height of each bar indicates the duration of that call, and for the X-axis no metric is chosen.

By default, as you can see in Image 1, the bar chart has only one axis selected with the Duration metric, displaying the execution duration on the GPU. This allows us to quickly find the biggest (or smallest) calls/passes in the frame - the parts of the frame which take the most (or the least) duration by looking at the heights of the bars. This works well in frames with a small number of rendering calls. But if you have thousands of draw calls it will become hard to read the chart as each bar becomes very narrow and you would need to select part of the frame to zoom into to get a better understanding.

Note: The numbers on the X-axis indicate the number of the API call, so that the width of the entire chart contains one bar for each API call.

 

There is a useful bar chart trick which works with any frame with any number of draw calls to better visualize information. Setting both axes to display Duration, makes the bars in the bar chart 2-dimensional. Meaning, GPU Duration X GPU Duration will display each call as an area, making it very easy to pick out the calls that take the longest, as they are much bigger in both directions than calls taking less time in the GPU.

Image 2: In this image, we show: GPU Duration by GPU Duration, ensuring that it will be easy to find the calls that take the most GPU time.Image 2: In this image, we show: GPU Duration by GPU Duration, ensuring that it will be easy to find the calls that take the most GPU time.

In this example with Intel® GPA’s Microsoft* DirectX 11 (DX11) sample, gpasample.exe, the difference between the default selection (Image 1) and the duration by duration selection (Image 2) is clearly visible. Note in particular, the visualization of draw call 95, which takes a significant portion of the frame. Compare the large area of call 95 in Image 2 to the representation of call 95 in Image 1, where the importance of this call is less clear due to the many very small GUI rendering calls displayed with the same narrow sized bars, making it appear that several of the calls contribute to significant frame time. Image 1 might lead you to optimize calls 91, 92 and 94 with equal developer time and energy as you spend optimizing call 95. Whereas Image 2 accurately leads you to focus more heavily on call 95.

This difference becomes even more apparent when looking at a very complex frame, like this frame from an Unreal Engine-based (UE-based) application:

Image 3: Bar chart representation of a very complex frame, using the default x-/y-axis selections.Image 3: Bar chart representation of a very complex frame, using the default x-/y-axis selections.

In the above image of the UE sample, you may pick out 5-10 calls that take significant frame time, leading you to spend time on code that is less critical to optimization. But below, in Image 4, with both axes set to duration, it is clear that calls 4719 and 7517 are the ones that, by far, take more time than any of the other calls in this frame, and therefore deserve more developer attention for optimization.

Image 4: Bar chart representation of the same complex frame, using GPU Duration for both the X and Y axes.Image 4: Bar chart representation of the same complex frame, using GPU Duration for both the X and Y axes.

 

 

Selecting Chart Regions: Microsoft PIX Markers

The bar chart allows us to visualize Microsoft PIX markers if the application is instrumented with the Microsoft PIX API. This is a very useful way to identify rendering passes: Shadows, G-Buffer, SSAO, and so on (if these passes are instrumented by the engineers who wrote the code). This feature is accessible from the Chart Regions drop-down menu as Debug Regions:

Image 5: Select from the Chart Regions dropdown to display region labels.Image 5: Select from the Chart Regions dropdown to display region labels.

This lets us clearly see performance for different parts of the application’s scene or workload, with no guessing. Just the facts. This is even more important for very complex frames with thousands of draw calls.

Usually, engineers enable and disable PIX markers during development and, in most cases disable them for release builds. So, it’s good to check if markers are enabled before capturing frames with Intel® GPA. In the picture below you can see PIX markers written on the top of the group of calls.

Image 6: With Debug Regions selected, the PIX Marker regions labeled across the top.Image 6: With Debug Regions selected, the PIX Marker regions labeled across the top.

Group By: Individual Calls vs. Groups

Groups help us to better visualize performance for groups of draw calls, such as various rendering passes. The following groups are available for selection:

Image 7: Select from the Group By dropdown to visualize calls by group.Image 7: Select from the Group By dropdown to visualize calls by group.

The Draw Calls grouping (Group by: Draw Calls) sets the bar chart to the individual draw call list. As an example, looking at gpasample.exe we can look at the Stairway scene – color only pass with the list of individual draw calls, in Image 8. Or we view the region as a group of draw calls when we choose Group by: Debug Regions, as in Image 9.

Image 8: With Chart Region set to Debug and Group by set to Draw Calls, we see the PIX markers and individual draw call.Image 8: With Chart Region set to Debug and Group by set to Draw Calls, we see the PIX markers and individual draw call.

 

Image 9: With Chart Region set to Debug and Group by set to Debug Regions, we see the PIX markers and groups of draw calls.Image 9: With Chart Region set to Debug and Group by set to Debug Regions, we see the PIX markers and groups of draw calls.

It is important to note that changing the Group By selection is not only a change in visual representation. Group selection changes the way performance metrics are collected. In the case of grouping by draw calls, metrics are measured for each draw call separately. So that,

  1. When grouping by Draw Calls, if a set of draw calls are selected, the metrics are displayed as the sum of the individual calls;
  2. Whereas if grouping by region (for example Debug Regions) the metrics are measured for each whole group, not each individual call.

This makes a difference when measuring very small draw calls. It allows us have more precise values for the whole group, especially when there are many very small calls, as measuring precision depends on the GPU execution time and introduces a variation in cases when the execution time is very small.

Using Various Metrics on the Y-Axis

Sometimes, it is interesting to quickly find calls with particular characteristics, like calls with the most geometry or the highest number of rendered pixels. It is easy to find if you set your characteristic of interest as the Y-axis label of the bar chart.

Setting the y-axis to Primitive Count or Vertex Count, for example, provides a quick look at geometry complexity across the scene:

Image 10: Y-Axis set to Primitive Count.Image 10: Y-Axis set to Primitive Count.

Setting it to Rasterized Pixels allows us to find fill-rate heavy calls:

Image 11: Y-axis set to Rasterized Pixels.Image 11: Y-axis set to Rasterized Pixels.

Setting it to Sample Bottleneck allows us to find calls which are limited by Texture Sampler:

Image 12: Y-axis set to Sample Bottleneck.Image 12: Y-axis set to Sample Bottleneck.

Advanced Profiling Mode

At the top-left corner of the bar chart you can find the flame icon which enables Advanced Profiling Mode. This enables calculation of metrics in a way that allows us to automatically detect GPU bottlenecks, grouping by bottleneck type. In this example from gpasample.exe the Advanced Profiling Mode shows the hottest bottlenecks in the scene from left to right. Examining each group leads us to understanding which calls are affected that might give ideas for optimization gains.

Note that the biggest bottleneck may not include the most time consuming call. Instead, it might consist of a set of less time consuming calls that, when taken together, add up to a large part of the frame time. Thus, optimizing for the worst bottleneck rather than optimizing for the most time consuming call, is likely to optimize several calls, in essence increasing developer time (optimization time) value.

Image 13: View bottlenecks - the ones taking the most time on the left.Image 13: View bottlenecks - the ones taking the most time on the left.

 

Summary

These are just few examples of the Bar Chart flexibility which can reduce the time of finding and analyzing performance bottlenecks of your applications. Tell us about your best Bar Chart practices and tips & tricks.

Resources for Intel® GPA

 

 

 

 

 

 

About the Author
Software engineer for 20+ years. Excels in all things software, plus connecting people and teams for optimal synergy.