OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

__private memory, spills and loop unrolling on HD Graphics

allanmac1
Beginner
312 Views

I have a few questions:

  • If IOC reports that private memory is being used then does that always imply that it's being spilled or can it reside in the remaining registers?
  • How do I detect or analyze my kernel to see if spilling is occurring?
  • Should an auto struct multidimensional array of registers (some dimensions are 1) that is always indexed with constants be automatically fully unrolled (and not appear as private memory)?

I'm seeing a lot of mov and send operations in the .asm dump -- more than I would expect -- and would like to understand what's happening in the kernel and how to get the auto variable struct to be "stationary" in the register file since all the accesses are constant.

This is on GEN9 / Win10/x64 and the latest driver.

One high point:  half floats seem to work OK!

Thanks,

Allan

0 Kudos
4 Replies
allanmac1
Beginner
312 Views

I was able to reduce private memory to zero in my kernel by significantly simplifying the struct automatic variable and using __attribute__((opencl_unroll_hint(n))).

Michal_M_Intel
Employee
312 Views

There is an Intel extension that may help you identify answers for your first two questions.

Extensions doc is here:

https://www.khronos.org/registry/cl/extensions/intel/cl_intel_driver_diagnostics.txt

Here is a sample code:

https://software.intel.com/en-us/articles/application-performance-using-intel-opencl-driver-diagnost...

You are looking for "bad" diagnostic messages, they will be generated upon clCreateKernel if:

- compiler during compilation of the kernel ran out of register space and additional surface for spill fills is required, if message is not present there are no spill fills.

- the amount of private memory that kernel uses doesn't fit into registers and global memory allocation needs to be created, if you don't see this diagnostic message it means that private memory is in registers.

allanmac1
Beginner
312 Views

That was nice and easy to implement.  The Intel driver is full of neat features.

I was already printing spill size info via CL_KERNEL_SPILL_MEM_SIZE_INTEL and know that my kernel is suffering from spills so I'm wondering how to interpret the "additional surface needs to be allocated" followed by 1012 KB (why 1012 KB?).

Performance hint: Kernel xyz_kernel register pressure is too high, spill fills will be generated, additional surface needs to be allocated of size 1036288, consider simplifying your kernel.

Thanks! 

Michal_M_Intel
Employee
312 Views

Thanks for feedback.

CL_KERNEL_SPILL_MEM_SIZE_INTEL tells you how much each Hardware Thread is spilling.

Value reported by driver diagnostics is in fact CL_KERNEL_SPILL_MEM_SIZE_INTEL multiplied by the number of Hardware Threads your device has with some padding.
 

 

Reply