- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a few questions:
- If IOC reports that private memory is being used then does that always imply that it's being spilled or can it reside in the remaining registers?
- How do I detect or analyze my kernel to see if spilling is occurring?
- Should an auto struct multidimensional array of registers (some dimensions are 1) that is always indexed with constants be automatically fully unrolled (and not appear as private memory)?
I'm seeing a lot of mov and send operations in the .asm dump -- more than I would expect -- and would like to understand what's happening in the kernel and how to get the auto variable struct to be "stationary" in the register file since all the accesses are constant.
This is on GEN9 / Win10/x64 and the latest driver.
One high point: half floats seem to work OK!
Thanks,
Allan
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was able to reduce private memory to zero in my kernel by significantly simplifying the struct automatic variable and using __attribute__((opencl_unroll_hint(n))).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is an Intel extension that may help you identify answers for your first two questions.
Extensions doc is here:
https://www.khronos.org/registry/cl/extensions/intel/cl_intel_driver_diagnostics.txt
Here is a sample code:
You are looking for "bad" diagnostic messages, they will be generated upon clCreateKernel if:
- compiler during compilation of the kernel ran out of register space and additional surface for spill fills is required, if message is not present there are no spill fills.
- the amount of private memory that kernel uses doesn't fit into registers and global memory allocation needs to be created, if you don't see this diagnostic message it means that private memory is in registers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That was nice and easy to implement. The Intel driver is full of neat features.
I was already printing spill size info via CL_KERNEL_SPILL_MEM_SIZE_INTEL and know that my kernel is suffering from spills so I'm wondering how to interpret the "additional surface needs to be allocated" followed by 1012 KB (why 1012 KB?).
Performance hint: Kernel xyz_kernel register pressure is too high, spill fills will be generated, additional surface needs to be allocated of size 1036288, consider simplifying your kernel.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for feedback.
CL_KERNEL_SPILL_MEM_SIZE_INTEL tells you how much each Hardware Thread is spilling.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page