topic Example of performing kernel queries in Intel® oneAPI DPC++/C++ Compiler

Example of performing kernel queries

Viet-Duc — Tue, 08 Jun 2021 08:03:15 GMT

Hi,

I am going through Intel official textbook: "Data Parallel C++"

https://www.apress.com/gp/book/9781484255735

I would like to tune the performance of nd_range kernel by optimize work group size. The two required paramers are (c.f. Chapter 12)

preferred_work_group_size preferred_work_group_size_multiple

But I could not figure out how to do obtain them. To quote the book itself (page 366)

The kernel preferred_work_group_size_multiple query can be used to choose an efficient work-group size. Please refer to Chapter 12 for more information on how to query properties of a device

Unfortunately, chapter 12 simply brushes over the aformentioned parameters without any explanation on how to use them. Querying properties of device is not the issue here.

The most I could figure out is that we need to pass a kernel object to 'parallel_for'. Do we query the kernel before or after the command submission ? Please pardon my ignorance, but I am very confused coming from a CUDA background.

I am grateful if support team can provide a simple example using the vecter addition implementation to find 'preferred_work_group_size_multiple'

Regards.

Re: Example of performing kernel queries

VidyalathaB_Intel — Mon, 14 Jun 2021 07:17:57 GMT

Hi,

Thanks for reaching out to us.

Regarding the usage of preferred_work_group_size_multiple:

In linux, we can directly get the preferred_work_group_size_multiple value by using clinfo command.

If you want to be more precise with the starting number used to choose multiples from,

query kernel_preferred_work_group_size_multiple using the following code snippet

const size_t max_device_work_group_size = kernel.get_work_group_info<sycl::info::kernel_work_group::preferred_work_group_size_multiple>(device)

>>can provide a simple example using the vector addition implementation to find 'preferred_work_group_size_multiple'

We are working on it, we will get back to you soon.

Regards,

Vidya.

Re: Example of performing kernel queries

Viet-Duc — Thu, 10 Jun 2021 00:46:42 GMT

Dear Vidya,

Thanks for suggesting clinfo. It is indeed a very useful tool.

For sake of reference, the following result was obtained from gen9 queue:

[E-2176G]

Max work item dimensions 3 Max work item sizes 8192x8192x8192 Max work group size 8192 Preferred work group size multiple (kernel) 128 Max sub-groups per work group 2048

[UHD P630]

Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple (device) 32 Preferred work group size multiple (kernel) 32 Max sub-groups per work group 32

Regarding the code snippet, that was also the only thing I could find while combing the internet. Still, it was not shown the context in which 'kernel' is defined. My gripe with C++ is that nothing is every straightforward.

I look forward to the full example.

Thanks for your time.

Re: Example of performing kernel queries

VidyalathaB_Intel — Mon, 14 Jun 2021 12:00:14 GMT

Hi,

>> I look forward to the full example.

Please find the implementation of preferred_work_group_size_multiple in the following code.

#include <CL/sycl.hpp> #include <iostream> #include <array> using namespace cl::sycl; int main() { const size_t szKernelData = 1024; std::array<float, szKernelData> kernelData; range<1> r(szKernelData); queue q{gpu_selector()}; program p(q.get_context()); p.build_with_source(R"CLC( kernel void sinf_test(global float* data) { data[get_global_id(0)] += 1 ; } )CLC", "-cl-std=CL1.2"); auto k = p.get_kernel("sinf_test"); auto sz = k.get_work_group_info<info::kernel_work_group::preferred_work_group_size_multiple>(q.get_device()); std::cout << sz << "\n"; return 0; }

Command Used:

dpcpp *.cpp && SYCL_BE=PI_OPENCL ./a.out

Regards,

Vidya.

Re: Example of performing kernel queries

Viet-Duc — Wed, 16 Jun 2021 02:47:25 GMT

Hi, Vidya

The code compiled and produce same results with clinfo.

Now I understood that the kernel must be compiled before passing it to parallel for.

In this sense, preferred_work_group_size_multiple is a constant carrying similar meaning as CUDA warp.

Could you confirm whether my understand is correct ?

I was confused as the book had implied that preferred_work_group_size_multiple was a kernel dependent variable.

After running the kernel once, the aformentioned value will changes from 64 to a different value.

Thanks.

Re:Example of performing kernel queries

VidyalathaB_Intel — Fri, 18 Jun 2021 11:04:51 GMT

Hi,

>> After running the kernel once, the aformentioned value will changes from 64 to a different value.

Could you please let us know what values are you getting with different runs ?

>> In this sense, preferred_work_group_size_multiple is a constant carrying similar meaning as CUDA warp.

Regarding this we will get back to you soon.

Thanks & Regards,

Vidya

Re: Example of performing kernel queries

Viet-Duc — Mon, 21 Jun 2021 02:13:21 GMT

Sorry the sentence came out wrong. I meant to explain what the book is implying.

The result of running the code is always a constant. There two separated values for preferred_work_group_size_multiple (UHD P630)

Preferred work group size multiple (device) 32 Preferred work group size multiple (kernel) 32

I just want to make sure that 'kernel' one is a hardware-based constant regardless of the nature of kernel

This way, we won't need to do runtime check for each kernel of different size.

Sorry for causing confusion.

Re:Example of performing kernel queries

Varsha_M_Intel — Fri, 09 Jul 2021 20:43:24 GMT

The preferred_work_group_size_multiple is a kernel query.

Returns a value, of which work-group size is preferred to be a multiple, for executing a kernel on a particular device. This is a performance hint. The value must be less than or equal to that returned by info::kernel_device_specific::work_group_size

Compiler uses certain heuristics depending on the kernel to determine this. And the number reported by clinfo "preferredworkgroupsizemultiple" might differ from this kernel query.

Hoe this answers your question.

Re: Example of performing kernel queries

Viet-Duc — Tue, 13 Jul 2021 05:36:40 GMT

Thanks for your clarification.

Re:Example of performing kernel queries

JyotsnaK_Intel — Fri, 06 Aug 2021 05:49:59 GMT

Thanks for accepting our solution. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.