- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
This issue is directly following the thread at https://community.intel.com/t5/Intel-oneAPI-Base-Toolkit/Error-running-OneAPI-FPGA-emulator/m-p/1258533/thread-id/1115 .
We've converted CUDA code to DPC++ with the DPCT tool, but are having trouble successfully running with an FPGA emulator on the DevCloud. The goal is to eventually run this app on an actual FPGA. Attached are the original source codes (CUDA and DPCT Migrated code) and the Intel-provided fixed code. Please note that the 'USE_GPU' flag must be set to 1 in order to target an accelerator as opposed to CPU.
When attempting to compile and run the provided fixed source code for FPGA-Emulator on Stratix 10 PAC, I get this error:
u75801@s001-n142:~/temp-cmt$ dpcpp -fintelfpga CMT-bone-pca-fix.dp.cpp -DFPGA_EMULATOR=1 -o cmt.out
u75801@s001-n142:~/temp-cmt$ ./cmt.out
Running on device: Intel(R) FPGA Emulation Device
HOST MESSAGE : Memory Allocation took, 0.00121431 seconds
Max work group size: 4100
Native API failed. Native API returns: -59 (CL_INVALID_OPERATION) -59 (CL_INVALID_OPERATION)Exception caught at file:CMT-bone-pca-fix.dp.cpp, line:677
u75801@s001-n142:~/temp-cmt$
This error is very similar to one posted in the previous thread (community.intel.com/t5/Intel-oneAPI-Base-Toolkit/Error-running-OneAPI-FPGA-emulator/m-p/1268231#M1252). However it seems that you guys had had success running the fixed code on FPGA-Emulator, FPGA, and GPU.
Any suggestions on what the issue could be?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I migrated the CUDA code and made some modifications, now it can be run successfully on FPGA Emulator, CPU and GPU
To run on FPGA Emulator,
$ dpcpp CMT-bone-pca_workarounds.dp.cpp
$ export SYCL_DEVICE_TYPE=ACC
$ SYCL_PI_TRACE=1 ./a.out
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so
SYCL_PI_TRACE[all]: Selected device ->
SYCL_PI_TRACE[all]: platform: Intel(R) FPGA Emulation Platform for OpenCL(TM)
SYCL_PI_TRACE[all]: device: Intel(R) FPGA Emulation Device
HOST MESSAGE : Memory Allocation took, 0.00134858 seconds
CUDA kernel avg duration: 0.00101494 seconds
CUDA kernel total duration: 0.97434193 seconds
Total kernel iterations: 960
Total time for grid dim 4 and element dim 5 : 0.980248
Cleanup: 0.00037760 seconds
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Could you please confirm the version of DPC++ you used, by using the below command?
dpcpp --version
We have successfully run the code on GPU by using the below command to compile:
dpcpp filename.cpp -o executable
Refer to the below screenshot.
Regarding the error related to FPGA EMULATOR, we will get back to you soon.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Santosh,
The DPC++ version is
Intel(R) oneAPI DPC++/C++ Compiler 2021.3.0 (2021.3.0.20210619)
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are working on your issue and we will get back to you soon.
Thanks & Regards,
Santosh Yeduru
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I migrated the CUDA code and made some modifications, now it can be run successfully on FPGA Emulator, CPU and GPU
To run on FPGA Emulator,
$ dpcpp CMT-bone-pca_workarounds.dp.cpp
$ export SYCL_DEVICE_TYPE=ACC
$ SYCL_PI_TRACE=1 ./a.out
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so
SYCL_PI_TRACE[all]: Selected device ->
SYCL_PI_TRACE[all]: platform: Intel(R) FPGA Emulation Platform for OpenCL(TM)
SYCL_PI_TRACE[all]: device: Intel(R) FPGA Emulation Device
HOST MESSAGE : Memory Allocation took, 0.00134858 seconds
CUDA kernel avg duration: 0.00101494 seconds
CUDA kernel total duration: 0.97434193 seconds
Total kernel iterations: 960
Total time for grid dim 4 and element dim 5 : 0.980248
Cleanup: 0.00037760 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for accepting our solution. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page