Tools
Explore new features and tools within Intel® products, communities, and platforms
77 Discussions

High Power Profiling of Rendering Applications: Intel® Arc™ Graphics (and other GPUs)

Pamela_H_Intel
Moderator
0 0 2,063

Optimization is critical in real time rendering applications. Knowing where bottlenecks are and what their severity is, guides you, the game developer, in improving your game’s framerate with the least possible time and effort. Intel® Graphics Performance Analyzers (Intel® GPA) is a tool suite designed to help developers of real time rendering applications find and root-cause bottlenecks quickly. In October 2022, Intel release new Intel® Arc discrete GPU family. This article describes multiple features of Intel® GPA which can help to analyze graphics applications on these new GPUs.

Philipp Gerasimov, Product Owner of Intel® GPA

Pamela Harrison, Software Technical Consulting Engineer for Intel® GPA

 

Over the last year and a half we have been augmenting Intel® Graphics Performance Analyzers (Intel® GPA) to work with hardware capabilities available in discrete GPUs.

One of the key Intel® GPA components, Graphics Frame Analyzer, now has additional features that make it easier to extract the data you need to quickly show where your biggest bottlenecks are. Find and root-cause those bottlenecks quickly so that you can move on to optimization.

Graphics Frame Analyzer

This tool is a powerful frame analysis tool.  You can capture frames from your game, inspect all aspects of them, and find performance issues at the individual draw call level, with geometry and other resource visualizations. You can analyze the captures and share them with colleagues. You can even replay them on other platforms to compare framerate and ensure that you are satisfied with the framerate on all your target platforms.

Performance is key.  You know that.

Higher performance allows your game to run smoothly on your target platforms. But if you are able to increase performance, you can add more features, improve visual quality, or expand to other platforms, giving you a bigger market share. That’s what profiling helps you achieve.

With Intel’s release of the Intel® Arc™ A Series – A750 and A770 discrete GPUs, Intel® GPA beefed up its GPU functionality. We have:

  • Introduced metrics pinning for faster analysis
  • Implemented DX12 Ultimate features in Graphics Frame Analyzer:
    • Mesh Shader metrics
    • DXR attributes

Profiling

First, before opening a frame, select the GPU that you want to use for analysis – you don’t have to disable the other GPUs, just choose the one you want right here in Graphics Frame Analyzer.

Click the triple-dot icon next to the GPU on the upper left to open the list of available GPUs on your platform.

Pamela_H_Intel_0-1680123014834.png

If you are running at 50-60 frames per second on a high end game box and are not achieving 30 fps on a midrange platform, you might think you need to target only high end platforms. However, it’s best to explore . . .

Metrics Pinning

You will likely want to pin various metrics combinations for different scenarios, saving them to appropriately named presets which can be used in any later analysis. For example, if you are running a DXR workload, you will want to select a metrics set with DXR applicable metrics. If you are looking at a mesh shader workload, you would choose a mesh shader metrics set. If you have neither of those in the portion of your game that you are analyzing, you may want to choose a basic metrics set.

Why Pin?

The amount of GPU telemetry data increased significantly with the release of Intel Arc GPUs – from 100s of metrics to 1000s. This increase comes from scaling and increasing number of computations blocks in the new discrete GPUs, compare to previous generations of Intel® Graphics products. And we are now able to profile much more with GPA, including new DX12 Ultimate features, like Ray Tracing.

However, the huge number of metrics would take much too long to calculate, possibly causing the application to hang. So now we let you choose the values you are interested in.

The Metrics Tab

In the Metrics Tab we have introduced Metrics Pinning, where users can pin their metrics of interest and save them to various presets to minimize metrics collection time and improve analysis speed.

Pamela_H_Intel_1-1680123318799.png

Note: Depending on your Graphics API and the platform where you are doing analysis, available metrics vary. For example, if you are using Microsoft* DirectX 11, you will not have any DirectX Raytracing (DXR) data because DXR is part of Microsoft* DirectX 12 Ultimate.

Choose a Frame

After opening a frame in Graphics Frame Analyzer, at the top left choose GPU Duration for both the X and the Y axes and look at the bar chart across the top of the tool. This selection set lets you to make the Bar Chart "two dimensional", simplifying to visualize the most important calls.  

Pamela_H_Intel_2-1680123720663.png

Choose the call that has the greatest area. That one will be the call that takes the most time. In this frame, it is call #95 that takes the most area, and therefore the most time, so let’s look at that one. It’s a Draw call.

Pamela_H_Intel_0-1680124680183.png

Adding metrics

The original preset when you first open Graphics Frame analyzer includes only GPU Duration. Let’s add more by following these steps:

  1. If you don’t want to overwrite the current preset, make a copy of it by pressing the copy icon
  2. Give it a new name
  3. Search for each metric you want and pin it, unpin metrics you don’t want
  4. After adding all the metrics you want for this list, click the checkmark to save your new list.

Pamela_H_Intel_1-1680124738476.png

For a basic metrics set, the following will provide valuable information:

  • GPU Duration
  • PS Invocations
  • VS Invocations
  • Pixels Rendered
  • Vertex Count
  • EU/XVE Active
  • EU/XVE Stall

Having pinned these metrics, you can now see basic performance metrics*. And, when you use Graphics Frame Analyzer in the future, if you’ve saved the set using the checkmark icon, it will be available by clicking the dropdown and choosing the metrics set by name.

Note: If you are looking at Compute calls or at the calls which use Mesh Shaders the selection should be different (see the next section).

DirectX 12 Ultimate features in Graphics Frame Analyzer

Mesh Shaders – DX12 Ultimate

Mesh Shaders is a new DX 12 Ultimate feature. They replace the traditional GPU geometry processing pipeline, allowing creation of more detailed and dynamic worlds. Two new types of GPU shaders were introduced: Mesh Shaders and Amplification Shaders.  Intel® GPUs fully support these new elements, providing dedicated metrics.

For analysis of workloads with Metrics Shaders select the DispatchMesh call. Select the shader to view the shader code.

Pamela_H_Intel_0-1680129361338.png

To view metrics, switch to the metrics tab. We can select the Quick Analysis metrics set from before, but Mesh Shaders use a different pipeline so we want some different metrics. For Mesh Shader workloads I might include the following:

  • AS Invocations
  • MS Invocations
  • PS Invocations
  • Pixels Rendered
  • XVE Active
  • XVE INST EXECUTED ALU0 MS UTILIZATION
  • XVE Stall
  • XVE Stall MS

We already have saved the meshShaderSet with precisely these metrics, so I click on that preset in the dropdown list, and . . .

Pamela_H_Intel_1-1680129361349.png

Here you can see these data values – the pipeline statistics metrics, including the information about Mesh and Amplification shaders for that call as well as the efficiency metrics.

Pamela_H_Intel_2-1680129361361.png

DXR – DX12 Ultimate

After capturing a frame from a DXR workload, open the frame in Graphics Frame Analyzer. Open the DispatchRays call so that you can see all the details of the call.

Pamela_H_Intel_3-1680129361394.png

As we did for Mesh Shaders, we created a pinned metrics set for DXR with several of the ray tracing metrics.

  • GPU Duration

Various counts for various cores:

  • RT QUAD LEAF RAY COUNT XECORE1
  • RT TRANSFORM RAY COUNT XECORE1
  • RT TRAVERSAL INPUT RAY COUNT …
  • RT TRAVERSAL OUTPUT RAY COUNT …
  • RT TRAVERSAL STEP RAY COUNT …
  • XVE active

Number of instructions executed for various shaders: closest-hit, any-hit, miss:

  • XVE INST EXECUTED ALU0 RT MS UTILIZATION
  • XVE INST EXECUTED ALU1 RT AHS UTILIZATION
  • XVE INST EXECUTED ALU1 RT CHS UTILIZATION
  • XVE INST EXECUTED SEND RT AHS UTILIZATION
  • XVE STALL
  • XVE STALL RT

Then, to understand the performance of the call you can take a look at the metrics and also the ray tracing shaders, the shader code, the ray tracing states and the performance metrics.

Closing Comments

This was a brief overview of the new Intel GPA features which helps to analyze performance on the new Intel discrete GPUs. Optimizing performance for modern games on the latest GPU platforms brings new challenges to provide better performance and better experience for all gamers. Intel GPA continually improves and evolves to help game developers accomplish their goals and move the gaming industry forward. 

Resources

 

Notices & Disclaimers 

Intel technologies may require enabled hardware, software or service activation. 

No product or component can be absolutely secure.  

Your costs and results may vary.  

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.   

About the Author
Software engineer for 20+ years. Excels in all things software, plus connecting people and teams for optimal synergy.