I am developing a DirectX based medical imaging application that uses volume rendering. We used to require discrete video cards, but now we are tweaking/re-engineering it to work on recent intel processors. (HD Graphics as well as ivy/sandy bridge).
We don't appear to be CPU bound. From the GPA, I know that we spend about 1% of the time in the vertex shader. During continuous render, The pixel shader is about 50% utilized, and it appears to be stalled the other half. We are sampling volume textures A LOT. As the title says, the texture sampler is busy 95% of the time. I suspect this is due to memory latency, but I don't know how to confirm this. I did not find a counter that indicates how much the sampler is waiting for memory. There is a counter that indicates wether the sampler is "stalled", but that is near zero all the time.
So what would be the logical next step in the performance analysis? I would like to know if we are limited by the sampler, memory bandwidth, or both.
Thanks in Advance
On Intel® HD Graphics 4000/2500: to access metrics marked with the asterisk (*), you must explicitly enable the Intel(R) Graphics Performance Analyzers option in your BIOS settings: Select Advanced Select System Agent (SA) Configuration Select Graphics Configuration Reboot your machine If the BIOS on your system does not include the Intel® Graphics Performance Analyzers option, update your BIOS to the latest version from Intel. After completing your performance monitoring activity, we recommend that you disable the Intel® Graphics Performance Analyzers BIOS option and reboot your machine.After enabling these extra metrics, you should now be able to see 8 different memory metrics (including texture reads) -- hopefully this gives you the info you need. For example, check out "GPU memory reads" to see whether you're having to fetch lots of data from the CPU -- The GPU Memory Reads metric represents the number of bytes read from memory by the GPU, and only includes reads due to cache misses and explicitly uncached resources. For texture data, only reads that miss both the texture cache and the L3 cache are included in this total. Therefore, the GPU Texture Reads metric could be significantly higher than the GPU Memory Reads metric if the L3 cache is effectively utilized. You also didn't indicate if you are using Intel GPA System Analyzer or Intel GPA Frame Analyzer. If you're using Intel GPA Frame Analyzer, you'll be able to see all the metrics together, and something may end up being unusual and therefore suspect. I would also recommend that you look at the documentation for each of the metrics -- there are various "hints" available for how to fix certain issues, and you may need to track down a number of them before seeing one or more issues (for example, http://software.intel.com/sites/products/documentation/gpa/12.5/index.htm#Sampler_Busy.htm). In that section, we suggest "Examine the GPU EUs Stalls metric to see amount of EUs stalls. If the percentage is high and the Texture Sampler Busy is close to 100%, most likely you have a texture bottleneck." You can also try various experiments, such as "2x2 textures" to isolate potential performance bottlenecks. You also indicated that you didn't believe that you were CPU-bound -- did you use any "overrides" in Intel GPA System Analyzer to help verify this? Hopefully this gets you started. If this brings up more questions, probably the next step is to get one of your frame capture files so that we can see all the data at once, and get a better picture of what's happening in the GPU. Regards, Neal
Please realize that for Ivy Bridge graphics, you'll want to set a BIOS option that provides access to a larger set of GPU metrics. This is documented here: http://software.intel.com/sites/products/documentation/gpa/12.5/hh_goto.htm#Metrics_List_for_Intel_Graphics_Performance_Analyzers.htm