- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
For GPU targets, can I call a function in kernel scope? I have functions codes that are in different files. I want to call them inside parallel_for ?
Could you please show me an example?
Thanks in advance
- Tags:
- General Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I understand what you want to do. You want to compile both of your file ie function file and main file separately and wants to link the function defined into functions file into the kernel of the main file. It is very much doable, please follow the below steps to do this:
- Create a header file declaring those functions. You can also see the sample below and add the SYCL_EXTERNAL attribute to it.
(kernel.h)
#pragma once #include<CL/sycl.hpp> extern SYCL_EXTERNAL void vectorAdd(const float *A, const float *B, float *C, int numElement, cl::sycl::item<1> item_ct1);
- Include this header file(kernel.h) into your main file like in our case main.cpp and also in the functions file ie kernel.cpp file. Compile both file main.cpp and kernel.cpp separately and link them to generate executable, this will work.
Do let us know if you face any problem while following the above steps.
Warm Regards,
Abhishek
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Yes, you can call a function inside parallel_for() from a different file.
There is a couple of ways through you can achieve this I have given examples of them below you can check it out.
- You can define the function definition in one file(kernel.cpp) and include that file in your main file(main.cpp) to access those functions into the main file and shown below:
(main.cpp)
#include <CL/sycl.hpp> #include <iostream> #include "kernel.cpp" #define numElements 10 using namespace std; int main(void){ size_t size = numElements * sizeof(float); // Allocate the host vectors float *A = (float *)malloc(size); float *B = (float *)malloc(size); float *C = (float *)malloc(size); for (int i = 0; i < numElements; ++i) { A = i;//rand()/(float)RAND_MAX; B = i;//rand()/(float)RAND_MAX; } float *d_A, *d_B, *d_C; cl::sycl::queue queue( cl::sycl::gpu_selector{}); std::cout << "Running on " << queue.get_device().get_info<cl::sycl::info::device::name>() << "\n"; cl::sycl::device dev = queue.get_device();; cl::sycl::context ctx = queue.get_context(); *((void **)&d_A) = cl::sycl::malloc_device(size, dev, ctx); *((void **)&d_B) = cl::sycl::malloc_device(size, dev, ctx); *((void **)&d_C) = cl::sycl::malloc_device(size, dev, ctx); memcpy((void*)(d_A), (void*)(A), size); memcpy((void*)(d_B), (void*)(B), size); { queue.submit( [&](cl::sycl::handler &cgh) { cgh.parallel_for<class vectorAdd_e83213>( cl::sycl::range<1>{numElements}, [=](cl::sycl::item<1> item_ct1) { vectorAdd(d_A, d_B, d_C, numElements, item_ct1); //defined in other file }); }); } queue.wait(); memcpy((void*)(C), (void*)(d_C), size); for(int i=0;i<numElements;i++) cout<<A<<" "<<B<<" "<<C<<" "<<endl; cout<<endl; free(A); free(B); free(C); return 0; }
(kernel.cpp)
#include <CL/sycl.hpp> // Device kernel void vectorAdd(const float *A, const float *B, float *C, int numElement, cl::sycl::item<1> item_ct1) { int i = item_ct1.get_linear_id(); if (i < numElement) { C = A + B; } }
- You can define __kernel in another file (kernel.cl) and can read the whole file into a buffer. Then you can call get_kernel("function_name") of Kernel class with the context of your device queue. This is like calling cl kernel inside the DPCPP program.
(main.cpp)
#include<CL/sycl.hpp> #include<iostream> #include<fstream> #include<string> #define N 10 using namespace cl::sycl; char* readCLFile(std::string f, unsigned int* size) { std::ifstream ifs(f); std::filebuf* fbuf = ifs.rdbuf(); *size = fbuf->pubseekoff(0, ifs.end, ifs.in); fbuf->pubseekpos(0, ifs.in); char* buf = new char[*size]; memset(buf, 0, sizeof(char) * (*size)); fbuf->sgetn(buf, (*size)); ifs.close(); return buf; } int main() { int a,b ,c ; unsigned int size; //auto R = range<1>(N); for (int i = 0; i < N; i++) { a = i; b = i; c = 0; } char* buf = readCLFile("kernel.cl", &size); queue q(gpu_selector{}); std::cout << "Running on " << q.get_device().get_info<cl::sycl::info::device::name>() << "\n"; auto ctx = q.get_context(); program p(ctx); p.build_with_source(std::string(buf, size)); kernel k = p.get_kernel("vec_add"); buffer<int, 1> bufa(a, range<1>(N) ); buffer<int, 1> bufb(b, range<1>(N) ); buffer<int, 1> bufc(c, range<1>(N) ); q.submit([&](handler& h) { auto acc_a = bufa.get_access<access::mode::read>(h); auto acc_b = bufb.get_access<access::mode::read>(h); auto acc_c = bufc.get_access<access::mode::read_write>(h); h.set_args(acc_a, acc_b , acc_c); h.parallel_for(range<1>(N), k); }); auto host_acc = bufc.get_access<access::mode::read>(); for (int i = 0; i < N; i++) std::cout << host_acc << "\n"; return 0; }
(kernel.cl)
__kernel void vec_add(__global int* ptr1, __global int* ptr2, __global int* ptr3) { int index = get_global_id(0); ptr3[index] = ptr1[index] + ptr2[index]; }
Hope this would have solved your problem.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Abhishek,
Thanks. This is NOT what I am asking. Let me clarify my question. I wanted a call a CPU function which is in another file. I don't want to inline it. Also, I don't want to write a low-level OpenCL kernel for it.
I mean, if I remove "#include "kernel.cpp" in first code and compile two files separately, would it work?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I understand what you want to do. You want to compile both of your file ie function file and main file separately and wants to link the function defined into functions file into the kernel of the main file. It is very much doable, please follow the below steps to do this:
- Create a header file declaring those functions. You can also see the sample below and add the SYCL_EXTERNAL attribute to it.
(kernel.h)
#pragma once #include<CL/sycl.hpp> extern SYCL_EXTERNAL void vectorAdd(const float *A, const float *B, float *C, int numElement, cl::sycl::item<1> item_ct1);
- Include this header file(kernel.h) into your main file like in our case main.cpp and also in the functions file ie kernel.cpp file. Compile both file main.cpp and kernel.cpp separately and link them to generate executable, this will work.
Do let us know if you face any problem while following the above steps.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's exactly what I was asking. Awesome thanks!
Follow-up questions
- Is it possible to enable nested parallelism "parallel_for" in "vectorAdd" function?
- Do I need pragma once?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the confirmation. Good to know that our provided solution helps you.
Regarding your followup questions:
- pragma once its a standard we use while working with header files. Because for a large application there might be the case compiler will get more than one initialization due to including those headers. So to avoid multiple initializations its good practice to use pragma once and it's up to you, whether you want to include it or not.
- For details regarding Nested parallelism, I will suggest you post a new thread because this topic is very much different from the current topic. So to get detailed idea about Nested parallelism please go forward and post a new thread. We will definitely help you there.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Glad to hear that solution provided helped.
Could you please raise a new thread for Nested Parallelism issue, giving details about your use case, so that we can make a feature request to the concerned team.
We are closing this thread.
Regards
Goutham
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page