- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a problem with execution of the following code:
#include <CL/sycl.hpp> #include <iostream> #include <array> #include <cstdio> using namespace std; using namespace cl::sycl; #define SIZE 20 struct DeviceData { queue q; int* A; int* d_A; void init() { q = queue(gpu_selector{}); A = (int* )malloc(sizeof(int)*SIZE); for(int i=0; i<SIZE; ++i) A = i+1; d_A = (int *)malloc_device(sizeof(int)*SIZE, q.get_device(), q.get_context()); } void test() { cout << "Before" << "\n"; for(int i=0; i<SIZE; ++i) { cout << A << " "; } cout << "\n"; q.submit([&](handler &h){ h.memcpy(d_A, A, sizeof(int)*SIZE); }); q.wait(); q.submit([&](handler &h){ h.parallel_for(range<1>{SIZE}, [=](id<1> i){ d_A += 10; }); }); q.wait(); q.submit([&](handler &h){ h.memcpy(A, d_A, sizeof(int)*SIZE); }); q.wait(); cout << "After" << "\n"; for(int i=0; i<SIZE; ++i) cout << A << " "; cout << "\n"; } void free() { free(A); free(d_A, q.get_context()); } }; int main() { DeviceData dev; dev.init(); dev.test(); dev.free(); return 0; }
The above code works well for CPU (when using cpu_selector{}), while it fails for GPU. I obtain the error: OpenCL API failed. OpenCL API returns: -50 (CL_INVALID_ARG_VALUE) -50 (CL_INVALID_ARG_VALUE). The problem occurs during execution of the kernel within test() function. This code is used to present the problem. I am working with more complex application which is implemented in the presented manner (using structures). The code is executed on Intel DevCloud.
I will be grateful for any advice.
Thanks :)
- Tags:
- General Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamil,
In your program, d_A is a device pointer. When accessing the class member data within the class member functions, the "this" pointer is passed implicitly and that is used to access the object members. So in this case, since d_A is a class member, any reference to d_A inside the class member function will be resolved as this->d_A. The caveat is, "this" pointer points to the memory where the class object is located and this class object is created on CPU memory. Hence when we try to use d_A inside a DPC++ kernel which is supposed to run inside a GPU, the expression "this->d_A" is invalid since "this" is invalid pointer from GPU side. The same code works on the CPU side since "this" pointer is valid pointer when the code executes on CPU. Below is a way to work around this issue by capturing the "this->d_A - device pointer" in a local variable in the kernel function (also I have modified the code to introduce newer syntax for USM which decreases the code verbosity):
#include <CL/sycl.hpp> #include <iostream> #include <array> #include <cstdio> using namespace std; using namespace cl::sycl; #define SIZE 20 struct DeviceData { queue q; int* A; int* d_A; void init() { q = queue(gpu_selector{}); A = (int* )malloc(sizeof(int)*SIZE); for(int i=0; i<SIZE; ++i) A = i+1; d_A = (int *)malloc_device(sizeof(int)*SIZE, q.get_device(), q.get_context()); } void test() { cout << "Before" << "\n"; for(int i=0; i<SIZE; ++i) { cout << A << " "; } cout << "\n"; q.memcpy(d_A, A, sizeof(int)*SIZE); q.parallel_for(range<1>{SIZE}, [=,d_A_local=this->d_A](id<1> i){ d_A_local += 10; }); q.memcpy(A, d_A, sizeof(int)*SIZE).wait(); cout << "After" << "\n"; for(int i=0; i<SIZE; ++i) cout << A << " "; cout << "\n"; } void free() { std::free(A); sycl::free(d_A, q.get_context()); } }; int main() { DeviceData dev; dev.init(); dev.test(); dev.free(); return 0; }
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamil,
Thanks for reaching out to us!
Could you please attach the error logs or screenshots of the output for the above code after running on devcloud and also mention the node name on which you are running. So that we would be able to investigate more on your issue.
Please provide the logs for both the cases after running your code with cpu_selector and gpu_selector.
Thanks
Goutham.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Output for cpu_seletor:
./main.exe Before 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 After 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Output for gpu_selector:
./main.exe Before 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 terminate called after throwing an instance of 'cl::sycl::runtime_error' what(): OpenCL API failed. OpenCL API returns: -50 (CL_INVALID_ARG_VALUE) -50 (CL_INVALID_ARG_VALUE) Aborted
Node name: s001-n160
Thanks for help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamil,
We are able to reproduce the error which you are facing.
Please find the below workaround for your code.
#include <CL/sycl.hpp> #include <iostream> #include <array> #include <cstdio> using namespace std; using namespace cl::sycl; #define SIZE 20 struct DeviceData { queue q; int* A; int* d_A; void init(int* A,int* d_A,queue q) { q = queue(gpu_selector{}); // A = (int* )malloc(sizeof(int)*SIZE); for(int i=0; i<SIZE; ++i) A = i+1; // d_A = (int *)malloc_device(sizeof(int)*SIZE, q.get_device(), q.get_context()); } void test(int* A,int* d_A,queue q) { cout << "Before" << "\n"; for(int i=0; i<SIZE; ++i) { cout << A << " "; } cout << "\n"; q.submit([&](handler &h){ h.memcpy(d_A, A, sizeof(int)*SIZE); }); q.wait(); q.submit([&](handler &h){ h.parallel_for(range<1>{SIZE}, [=](id<1> i){ d_A += 10; }); }); q.wait(); q.submit([&](handler &h){ h.memcpy(A, d_A, sizeof(int)*SIZE); }); q.wait(); cout << "After" << "\n"; for(int i=0; i<SIZE; ++i) cout << A << " "; cout << "\n"; } void free(int* A,int* d_A,queue q) { std::free(A); cl::sycl::free(d_A, q.get_context()); } }; int main() { DeviceData dev; dev.d_A=(int *)malloc_device(sizeof(int)*SIZE, dev.q.get_device(), dev.q.get_context()); dev.A=(int* )malloc(sizeof(int)*SIZE); dev.init(dev.A,dev.d_A,dev.q); dev.test(dev.A,dev.d_A,dev.q); dev.free(dev.A,dev.d_A,dev.q); return 0; }
Please let us know if this resolves your issue.
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamil,
Could you please let us know if your issue is resolved or if you are facing any issues with the code?
Thanks
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
i am apologize for delay - spring holiday. Yes, the code provided by you works fine.
But, I have a question. Why my code does not work properly? I don't know if I am right, but is seems that the queue should be created within the same instruction block next to kernel and buffers.
Best regards,
Kamil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamil,
We are investigating more about your issue.
We are working with the concerned team and we will get back to you with more information.
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
i am waiting for more information :) Thanks you for helping me solve the problem :)
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamil,
In your program, d_A is a device pointer. When accessing the class member data within the class member functions, the "this" pointer is passed implicitly and that is used to access the object members. So in this case, since d_A is a class member, any reference to d_A inside the class member function will be resolved as this->d_A. The caveat is, "this" pointer points to the memory where the class object is located and this class object is created on CPU memory. Hence when we try to use d_A inside a DPC++ kernel which is supposed to run inside a GPU, the expression "this->d_A" is invalid since "this" is invalid pointer from GPU side. The same code works on the CPU side since "this" pointer is valid pointer when the code executes on CPU. Below is a way to work around this issue by capturing the "this->d_A - device pointer" in a local variable in the kernel function (also I have modified the code to introduce newer syntax for USM which decreases the code verbosity):
#include <CL/sycl.hpp> #include <iostream> #include <array> #include <cstdio> using namespace std; using namespace cl::sycl; #define SIZE 20 struct DeviceData { queue q; int* A; int* d_A; void init() { q = queue(gpu_selector{}); A = (int* )malloc(sizeof(int)*SIZE); for(int i=0; i<SIZE; ++i) A = i+1; d_A = (int *)malloc_device(sizeof(int)*SIZE, q.get_device(), q.get_context()); } void test() { cout << "Before" << "\n"; for(int i=0; i<SIZE; ++i) { cout << A << " "; } cout << "\n"; q.memcpy(d_A, A, sizeof(int)*SIZE); q.parallel_for(range<1>{SIZE}, [=,d_A_local=this->d_A](id<1> i){ d_A_local += 10; }); q.memcpy(A, d_A, sizeof(int)*SIZE).wait(); cout << "After" << "\n"; for(int i=0; i<SIZE; ++i) cout << A << " "; cout << "\n"; } void free() { std::free(A); sycl::free(d_A, q.get_context()); } }; int main() { DeviceData dev; dev.init(); dev.test(); dev.free(); return 0; }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Anoop,
Thanks for providing a detailed explanation.
Hi Kamil,
Please let us know if the information provided is helpful.
Confirm if your issue is resolved and let us know whether we can close this thread.
Thanks
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamil,
Please confirm if the explanation provided helped.
Let us know if we can close this thread.
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
yes, this advice resolve my problem completely.
I have another question, but I don't want to create new topic. I have a problem with math functions like cos(), sin() etc. on the iGPU side. When I am trying use them within the kernel I obtain the error: undefined reference to `cos()'.
Best regards,
Kamil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamil,
We are glad to know that the solution provided resolved your issue.
Sure, we will help you with your other issue. Please provide your code if you can. So that it will help us to investigate.
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I modified code presented in this topic to show the problem:
#include <CL/sycl.hpp> #include <iostream> #include <array> #include <cstdio> #include <cmath> using std::cout; using std::endl; using namespace cl::sycl; #define SIZE 20 struct DeviceData { queue q; double* A; double* d_A; void init() { q = queue(gpu_selector{}); A = (double* )malloc(sizeof(double)*SIZE); for(int i=0; i<SIZE; ++i) A = i+1; d_A = (double *)malloc_device(sizeof(double)*SIZE, q.get_device(), q.get_context()); } void test() { cout << "Before" << "\n"; for(int i=0; i<SIZE; ++i) { cout << A << " "; } cout << "\n"; q.memcpy(d_A, A, sizeof(double)*SIZE); q.parallel_for(range<1>{SIZE}, [=,d_A_local=this->d_A](id<1> i){ d_A_local = cos(i+1); }); q.memcpy(A, d_A, sizeof(double)*SIZE).wait(); cout << "After" << "\n"; for(int i=0; i<SIZE; ++i) cout << A << " "; cout << "\n"; } void free() { std::free(A); sycl::free(d_A, q.get_context()); } }; int main() { DeviceData dev; dev.init(); dev.test(); dev.free(); return 0; }
Output:
./main.exe Before 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 terminate called after throwing an instance of 'cl::sycl::compile_program_error' what(): The program was built for 1 devices Build program log for 'Intel(R) Gen9 HD Graphics NEO': error: undefined reference to `cos()' error: backend compiler failed build. 0 (CL_SUCCESS) Makefile:43: recipe for target 'run' failed make: *** [run] Aborted
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamil,
Please find the attached code which will resolve the issue. As per the SYCL standard, the built-in functions (sin(),cos()..etc) can take as input float or optionally double.
#include <CL/sycl.hpp> #include <iostream> #include <array> #include <cstdio> #include <cmath> using std::cout; using std::endl; using namespace cl::sycl; #define SIZE 20 struct DeviceData { queue q; double* A; double* d_A; void init() { q = queue(gpu_selector{}); A = (double* )malloc(sizeof(double)*SIZE); for(int i=0; i<SIZE; ++i) A = i+1; d_A = (double *)malloc_device(sizeof(double)*SIZE, q.get_device(), q.get_context()); } void test() { cout << "Before" << "\n"; for(int i=0; i<SIZE; ++i) { cout << A << " "; } cout << "\n"; q.memcpy(d_A, A, sizeof(double)*SIZE); q.parallel_for(range<1>{SIZE}, [=,d_A_local=this->d_A](id<1> i){ d_A_local = cl::sycl::cos((float)i+1); // cos() takes input as float datatype }); q.memcpy(A, d_A, sizeof(double)*SIZE).wait(); cout << "After" << "\n"; for(int i=0; i<SIZE; ++i) cout << A << " "; cout << "\n"; } void free() { std::free(A); sycl::free(d_A, q.get_context()); } }; int main() { DeviceData dev; dev.init(); dev.test(); dev.free(); return 0; }
Please confirm if this resolves the issue.
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
yes, it works fine. Thanks for help :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kamil,
Glad to hear that your issue got resolved.
We are closing this thread.
Feel free to raise a new thread in case of any further support.
Regards
Goutham
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page