- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I converted a CUDA code to DPC++ with the DPCT tool, and I am trying to run this on an FPGA on the DevCloud. I am first trying to test functionality with the FPGA emulator, but I am getting an invalid binary error error shown here:
u40772@s001-n088:~/cmt-fpga/pca$ dpcpp -fintelfpga CMT-bone-pca.dp.cpp -DFPGA_EMULATOR=1 -o cmt.out
u40772@s001-n088:~/cmt-fpga/pca$ ./cmt.out
TBB Warning: The number of workers is currently limited to 23. The request for 31 workers is ignored. Further requests for more workers will be silently ignored until the limit changes.
HOST MESSAGE : Memory Allocation took, 0.00456611 seconds
Max work group size: 4100
Native API failed. Native API returns: -42 (CL_INVALID_BINARY) -42 (CL_INVALID_BINARY)Exception caught at file:CMT-bone-pca.dp.cpp, line:650
u40772@s001-n088:~/cmt-fpga/pca$
As shown, the FPGA compile for emulator completes without error or warning, but execution of the output gives an invalid binary error that was caught in the code block that should call the accelerator device.
The only thing I could find with this specific error is here: https://community.intel.com/t5/Intel-High-Level-Design/CL-INVALID-BINARY-when-running-fast-recompile-example-from/td-p/1224496 which suggests that it is related to the hardware target for compilation not matching up with available resources. However, since I am targeting the FPGA emulator, I would think just the CPU device would be necessary, although I am trying this on an Arria 10 node, so that should be available too.
Any suggestions? Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ryan,
Thanks for reaching out to us!
Could you please share the source codes(CUDA Code and DPCT Migrated code) if possible?
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Goutham,
I have attached the original CUDA as well as the converted DPC++ code. (sorry for the messiness)
I am wondering if the dpct::get_current_device() is finding the Arria 10 FPGA rather than the FPGA emulator during compilation. However, I haven't been able to find great documentation on the priority of this function. The compilation actually failed when I tried it off the FPGA node, so maybe this makes sense, but I would think the compilation would take significantly longer if it was targeting a physical FPGA.
Please let me know if you have any suggestions, thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ryan,
We have tried opening the attachment provided but the folder is empty.
kindly attach the code again.
Thanks & Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oops... guess I forgot the -r. I have reattached the files.
In general though, this appears to be some kind of error with the requested vs available devices. Is there any documentation on the oneAPI get_current_device() function? I have not been able to find much that goes into its selection priority.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ryan,
Thanks for the reproducer, we are working on your issue.
>>Is there any documentation on the oneAPI get_current_device() function?
Please refer to the below link for the documentation DPCT.
Have a Good day!
Thanks & Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I have seen that guide already, unfortunately it is not very informative.
I see that by default, the converted code uses the dpct::get_current_device() function in order to select its target, but there is no explanation as to how that function prioritizes its choice when there are potentially multiple different targets (e.g., CPU, FPGA, FPGA emulator). I also see that the dev_mgr can change the current device using select_device(), but there is no explanation on how to actually use this function.
Further documentation on how to properly select FPGA and FPGA emulator target devices with DPC++ would be much appreciated. Or if this invalid binary error has nothing to do with device selection that would be good to know. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ryan,
We are working on your issue, we will get back to you soon.
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Goutham. Could you provide any insight as to what the potential cause of this error might be? Am I on the right track in thinking it has to do with the OneAPI device_selector? Any information would be helpful.
By the way, I am attaching a slightly updated version of the previous code. If you were able to get past the invalid binary error there would likely be a seg fault, which is now fixed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ryan,
I found that the previous solution was wrong. In the code, the macro ‘USE_GPU’ was set to 0, so the code ran serially and the kernel function was useless. I have delivered your issue to a FPGA expert, and wait for the feedback. If I get any feedbacks, I will let you know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response, I too was initially fooled by the previous solution when the FPGA emulator ran while the USE_GPU (accelerator in this case) flag was 0. That solution (which I no longer see here) did seem to solve the device issue, which is a step in the right direction, as far as porting other CUDA codes to FPGAs with oneAPI.
However, now I am seeing this error from the FPGA emulator (with the accelerator flag set to 1):
"OpenCL API failed. OpenCL API returns: -59 (CL_INVALID_OPERATION) -59 (CL_INVALID_OPERATION)Exception caught at file:test_fpga.cpp, line:677"
Searching for this error, I found this forum post (https://community.intel.com/t5/GPU-Compute-Software/Can-I-mix-openCL-and-level-0-Native-API-returns-59-CL-INVALID/m-p/1255339#U1255339 ), which suggest it might be a memory issue between the kernel and host.
Does that appear to be on the right track in this case? Any Suggestions? Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ryan,
We found that this code could run on GPU successfully, but failed on CPU and FPGA emulator. So we need to do more investigations. I will let you know when we find the root cause.
Regards,
Chen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have an expert in FPGA looked at this issue. The --DFPGA_EMULATOR is trying to select a device selector, but the d_selector isn’t actually used anywhere in the code. Instead, it seems the main function gets the device/queue from some dpct class.
Changing the device selector/ queue initializations to something more standard made it work. Attached is the completed source file.
Relevant snippets:
#if FPGA_EMULATOR
INTEL::fpga_emulator_selector d_selector;
#else
default_selector d_selector;
#endif
#include <dpct/dpct.hpp>
int main(int argc, char *argv[]) try {
dpct::device_ext &dev_ct1 = dpct::get_current_device();
sycl::queue &q_ct1 = dev_ct1.default_queue();
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I migrated the CUDA code and made some modifications, now it can be run successfully on FPGA Emulator, CPU and GPU
To run on FPGA Emulator,
$ dpcpp CMT-bone-pca_workarounds.dp.cpp
$ export SYCL_DEVICE_TYPE=ACC
$ SYCL_PI_TRACE=1 ./a.out
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so
SYCL_PI_TRACE[all]: Selected device ->
SYCL_PI_TRACE[all]: platform: Intel(R) FPGA Emulation Platform for OpenCL(TM)
SYCL_PI_TRACE[all]: device: Intel(R) FPGA Emulation Device
HOST MESSAGE : Memory Allocation took, 0.00134858 seconds
CUDA kernel avg duration: 0.00101494 seconds
CUDA kernel total duration: 0.97434193 seconds
Total kernel iterations: 960
Total time for grid dim 4 and element dim 5 : 0.980248
Cleanup: 0.00037760 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ryan,
Did the solution provided help you fix the issue? Please let us know if this is still an issue.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We haven't heard back from you for a long time so we are assuming that the provided details helped you in solving your problem. We will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page