- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to understand how Occupancy is calculated. My initial assumption is that, it is the total number of XVE threads scheduled in a given time to the maximum possible XVE threads that can be scheduled
But later i found that global memory latency, synchronization and many other factors affects occupancy. So my understanding of occupancy should be wrong, because irrespective of all above factors, the number of XVE threads that needs to be scheduled will be same to complete the workload given. So it is somehow related to overall execution time and i would like to understand that part
I am not able to get a clear understanding from the documentation, so wanted to check further on it
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @SampathRachumallu, first of all you might want to check https://oneapi-src.github.io/oneAPI-samples/Tools/GPU-Occupancy-Calculator/ to see what can be the "static" limiting factors for the occupancy. Those are global size, local size, kernel SIMD width, amount of SLM used, and prior PVC the usage of barriers could also limit the occupancy.
Apart from this - VTune is measuring occupancy using time-based sampling, so what it really gets for a sample: (sum of all the clocks when a thread was scheduled) / (sum of all the clocks * num of thread slots). So for a sample interval you can't really tell if it was e.g. 50% thread slots busy all the time, or 100% thread slots busy 50% of the time.
There can also be some "dynamic" aspects affecting occupancy, e.g. super short threads that finish so fast that scheduling overhead becomes visible.
And one last thing: VTune is currently not aware if the kernel runs in large GRF mode (calculator has this option BTW), thus VTune will still normalize occupancy by the full number of thread slots, so for such kernels the top-possible measured by VTune occupancy would be just 50%.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @SampathRachumallu, first of all you might want to check https://oneapi-src.github.io/oneAPI-samples/Tools/GPU-Occupancy-Calculator/ to see what can be the "static" limiting factors for the occupancy. Those are global size, local size, kernel SIMD width, amount of SLM used, and prior PVC the usage of barriers could also limit the occupancy.
Apart from this - VTune is measuring occupancy using time-based sampling, so what it really gets for a sample: (sum of all the clocks when a thread was scheduled) / (sum of all the clocks * num of thread slots). So for a sample interval you can't really tell if it was e.g. 50% thread slots busy all the time, or 100% thread slots busy 50% of the time.
There can also be some "dynamic" aspects affecting occupancy, e.g. super short threads that finish so fast that scheduling overhead becomes visible.
And one last thing: VTune is currently not aware if the kernel runs in large GRF mode (calculator has this option BTW), thus VTune will still normalize occupancy by the full number of thread slots, so for such kernels the top-possible measured by VTune occupancy would be just 50%.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To better understand the metric of XVE Thread Occupancy, you can refer to the GPU occupancy section in the GPU tuning guide below:

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page