Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*

Macro to target specific backend in kernel

VinInn
Beginner
2,049 Views

Is there any specific oneAPI macro (or other compile time construct as macros seem to be interpreted at dpcpp step) to specifically target a given backend/platform (gpu, cpu) in a kernel (JIT compiled)?

something like

#ifdef ON_CPU

  // some code that does not compile on GPU (or is optimized for CPU)

#elif ON_GPU

 // an alternative for the GPU

#else

 // throw?

#endif

 

 

0 Kudos
12 Replies
NoorjahanSk_Intel
Moderator
2,015 Views

Hi,

 

Thanks for reaching out to us.

 

>>Is there any specific oneAPI macro..

Please find the sample code and steps we have followed to target specific devices from our end.

 

vec_add.cpp:

 

#include <CL/sycl.hpp>
#include <iostream>
#include <vector>
using namespace sycl;
#define size 1024

int main() {
std::vector<int> A(size, size), B(size, size), C(size, 0);
{ 
queue cpuQ(cpu_selector{});
queue gpuQ(gpu_selector{});
range<1> R(size);
buffer<int,1> buffA(A.data(), R);
buffer<int,1> buffB(B.data(), R);
buffer<int,1> buffC(C.data(), R);
#ifdef CPU
std::cout<<"Running on: "<<cpuQ.get_device().get_info<sycl::info::device::name>()<<"\n";
cpuQ.submit([&](handler &cgh) {
auto acc_buffA = buffA.get_access<access::mode::read>(cgh);
auto acc_buffB = buffB.get_access<access::mode::read>(cgh);
auto acc_buffC = buffC.get_access<access::mode::write>(cgh);
cgh.parallel_for(R, [=](id<1> i) {
acc_buffC[i] = acc_buffA[i] + acc_buffB[i];
}
);
});
#elif GPU
std::cout<<"Running on: "<<gpuQ.get_device().get_info<sycl::info::device::name>()<<"\n";
gpuQ.submit([&](handler &cgh) {
auto acc_buffA = buffA.get_access<access::mode::read>(cgh);
auto acc_buffB = buffB.get_access<access::mode::read>(cgh);
auto acc_buffC = buffC.get_access<access::mode::write>(cgh);
cgh.parallel_for(R, [=](id<1> i) {
acc_buffC[i] = acc_buffA[i] + acc_buffB[i];
}
);
});
#endif
} 
std::vector<int> vecValidate(size, 2*size);
(C==vecValidate) ? std::cout << "Success\n" : std::cout<<"Failure\n";
return 0;
}

 

 

Command to execute:

dpcpp vec_add.cpp -DCPU -o cpuadd && ./cpuadd

dpcpp vec_add.cpp -DGPU -o gpuadd && ./gpuadd

 

Please find the attached screenshot for more details:

NoorjahanSk_Intel_0-1646309964491.png

 

We use the -D option to define a macro name while compiling the source code.

Please refer to the below link for more details:

https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/preprocessor-options/d.html

 

We can target specific backend using the SYCL_DEVICE_FILTER environment variable.

Syntax:

 

export SYCL_DEVICE_FILTER=backend:device_type:device_num

 

Possible values of backend are:

host

level_zero

opencl

 

Possible values of device_type are:

host

cpu

gpu

acc

device_num is an integer that indexes the enumeration of devices from the sycl-ls utility tool and will return all devices with an index from all different backends.

 

Please find below link for more details:

https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-programming-model/device-selection.html

 

Thanks & Regards,

Noorjahan

 

0 Kudos
NoorjahanSk_Intel
Moderator
1,982 Views

HI,


Has the information provided above helped? If yes, Could you please confirm whether we can close this thread from our end?


Thanks & Regards,

Noorjahan.


0 Kudos
VinInn
Beginner
1,977 Views

I understood that what I had in mind cannot be done,

will do otherwise.

 

 

thanks for the help.

Please close the thread.

0 Kudos
NoorjahanSk_Intel
Moderator
1,957 Views

Hi,


>>I understood that what I had in mind cannot be done


Could you please elaborate more on your issue so that we will try to help you in another way?



Thanks & Regards,

Noorjahan.


0 Kudos
VinInn
Beginner
1,949 Views

Say I have a kernel that I want to run on GPU and CPU but a small part I wish to use an optimized routine that will NOT compile in the other and I do not wnat (as in your example) to duplicate all the code and I do not want to split my kernel

this will not work.

auto kernel = [=](auto i,....) {

//

// a lot of code

//

#ifdef ONGPU

  optimizedForGPU  // may not parse on HOST

#else

 optimizedForCPU

#endif

//

// more code

//

};

 

 

 

0 Kudos
VinInn
Beginner
1,862 Views

In any case my questions were:

1) can macros be used in a sycl kernels to drive JIT compilation?

My understanding is: NO, they are parsed by the host compiler ahead-of-time (a in your example)

2) is there any other mechanism in sycl to compile part of a kernel for a specific target?

my understanding is: NO, there is no "special" mechanism in sycl to perform JIT conditional compilation.

 

Thanks again for the support and please close the thread.

0 Kudos
NoorjahanSk_Intel
Moderator
1,771 Views

Hi,


>>Thanks again for the support. please close the thread


We have discussed this case with concerned team and they are interested to get some more details. So, could you please let us know is there any particular

use-case behind compiling part of the kernel on a specific target so that we could check the possibilities internally and will try to address your questions specifically?


Before proceeding further in closing this thread as per your request,

I think it would also become a good reference to anybody looking at this topic with similar thoughts as yours by providing your use-case.



Thanks & Regards,

Noorjahan.


0 Kudos
VinInn
Beginner
1,762 Views

for instance in some kernel I run either on GPU or on CPU under cuda/nvcc at some point I need to perform a sort  and I use CUDA_ARCH to select either a bucket-sort for GPU (single block) or  quick sort on CPU (single thread).

 

0 Kudos
NoorjahanSk_Intel
Moderator
1,667 Views


Hi,


Could you please provide us a sample reproducer with CUDA_ARCH, so that we can understand it better?


Thanks & regards,

Noorjahan.


0 Kudos
NoorjahanSk_Intel
Moderator
1,576 Views

Hi,

We are working on your issue. We will get back to you soon.

Could you please confirm whether your are using nvrtc Cuda runtime library in your application?

Thanks & Regards,

Noorjahan.

 

0 Kudos
NoorjahanSk_Intel
Moderator
1,495 Views

Hi,


We have reported this issue to the development team, they are looking into this issue.


Thanks & Regards,

Noorjahan.


0 Kudos
Reply