- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I encountered problems when using the GPU to run the program on the devcloud:
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -1 (PI_ERROR_DEVICE_NOT_FOUND) -1 (PI_ERROR_DEVICE_NOT_FOUND)
It's hard to say how to reproduce, I don't have this error in many cases.
This error shows so little useful information that it is difficult for me to locate the problem.
I'm not sure if this is due to the calculation being done on the GPU without returning a result.
Does anyone know more about this error and can provide me with more information?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
The reason behind the below error was, you are trying to run a GPU binary on the non-GPU node on Devcloud i.e. the node you are using doesn't have a GPU device on it.
How to check the GPU device/list of devices on a particular node? use the below command :
sycl-ls
Please use the below command to access a particular GPU node:
qsub -I -l nodes=1:gpu:ppn=2 -d .
To know more about job submission commands please follow the below link:
https://devcloud.intel.com/oneapi/documentation/job-submission/
If this resolves your issue, make sure to accept this as a solution. This would help others with similar issues. Thank you!
Regards,
Jaideep
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I use qsub -I -l nodes=1:gpu:ppn=2 -d . Assign me computing node
sycl-ls
Message as follows:
[opencl:cpu:0] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i9-11900KB @ 3.30GHz 3.0 [2023.16.7.0.21_160000]
[opencl:gpu:1] Intel(R) OpenCL HD Graphics, Intel(R) UHD Graphics [0x9a60] 3.0 [22.43.24595.35]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) UHD Graphics [0x9a60] 1.3 [1.3.24595]
This error occurs about 20 seconds after running the program:
Running on: Intel(R) UHD Graphics [0x9a60]
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -1 (PI_ERROR_DEVICE_NOT_FOUND) -1 (PI_ERROR_DEVICE_NOT_FOUND)
Aborted
real 0m20.771s
user 0m8.174s
sys 0m12.573s
Then I check the device information:
sycl-ls
[opencl:cpu:0] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i9-11900KB @ 3.30GHz 3.0 [2023.16.7.0.21_160000]
[opencl:gpu:1] Intel(R) OpenCL HD Graphics, Intel(R) UHD Graphics [0x9a60] 3.0 [22.43.24595.35]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) UHD Graphics [0x9a60] 1.3 [1.3.24595]
My sycl select device code is as follows:
cl::sycl::queue deviceQueue(cl::sycl::default_selector_v);
std::cout << "Running on: "
<< deviceQueue.get_device().get_info<cl::sycl::info::device::name>()
<< std::endl;
The queue task submission code is roughly as follows:
u->queue.submit([&](cl::sycl::handler& cgh) {
cgh.single_task<class my_kernel>([=]() {
});
});
u->queue.wait_and_throw();
In fact, the situation where this error occurs is: I used to allocate shared memory through USM, and then calculate on the GPU. After one calculation is completed, the kernel is interrupted, the result of this time is output, and then the kernel is restarted for the next calculation, and so on. In this case there is no error. But in this case, the efficiency will be very low, so I canceled the output code after the operation is completed, and I want the GPU to not interrupt until all operations are completed, and then this error will appear.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please share us the sample reproducer so that we can investigate your issue more thoroughly?
Thanks & Regards,
Vankudothu Vaishnavi.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Here's a minimal reproduction of the problem:
The reason for the problem is that there is an infinite loop in the SYCL kernel code. Even if a certain state of the infinite loop will stop, the same error will occur. At the same time, I am not sure how large the for loop will be. This error will appear. This doesn't seem to be the cause of DevCloud, but of dpcpp? I'm not sure why this is a problem.
Create a test file:
infinite_loop.cpp
#include <CL/sycl.hpp>
int main() {
cl::sycl::queue queue;
std::vector<int> data(1, 42);
cl::sycl::buffer<int, 1> buffer(data.data(), data.size());
queue.submit([&](cl::sycl::handler& cgh) {
auto acc = buffer.get_access<cl::sycl::access::mode::read_write>(cgh);
cgh.parallel_for<class infinite_loop>(
cl::sycl::range<1>(data.size()),
[=](cl::sycl::id<1> idx) {
for(;;) {
}
});
});
queue.wait_and_throw();
return 0;
}
compile and run
icpx -fsycl infinite_loop.cpp -o infinite_loop
./infinite_loop
This is not caused by devcloud environment problems
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The cause of the problem seems to be: A workload that takes more than four seconds for GPU hardware to execute is a long-running workload. By default, individual threads that qualify as long-running workloads are considered hung and are terminated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for sharing the reproducer with us.
>>The cause of the problem seems to be: A workload that takes more than four seconds for GPU hardware to execute is a long-running workload. By default, individual threads that qualify as long-running workloads are considered hung and are terminated.
Did disabling the GPU Hang check resolve the problem for you?
Thanks and Regards,
Vankudothu Vaishnavi.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi,
I don't have permission to execute this command on devcloud
sudo sh -c "echo N> /sys/module/i915/parameters/enable_hangcheck"
@VaishnaviV_Intel wrote:Hi,
Thanks for sharing the reproducer with us.
>>The cause of the problem seems to be: A workload that takes more than four seconds for GPU hardware to execute is a long-running workload. By default, individual threads that qualify as long-running workloads are considered hung and are terminated.
Did disabling the GPU Hang check resolve the problem for you?
Thanks and Regards,
Vankudothu Vaishnavi.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>I don't have permission to execute this command on devcloud
Intel DevCloud is a shared environment which comes with pre-installed validated Intel oneAPI software and complimentary packages. As a policy, we do not install custom (open source or 3rd party licensed) software to the environment.
So, We can't help much here. If you still have any issues, do let us know.
Thanks & Regards,
Vankudothu Vaishnavi.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you.
Do you have any other issues? If no, could you please confirm whether we can close this thread from our end?
Thanks & Regards,
Vankudothu Vaishnavi.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We haven't heard back from you. If you have any issues, please post a new question as this thread will no longer be monitored by Intel.
Thanks & Regards,
Vankudothu Vaishnavi.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page