I converted a CUDA code to DPC++ with the DPCT tool, and I am trying to run this on an FPGA on the DevCloud. I am first trying to test functionality with the FPGA emulator, but I am getting an invalid binary error error shown here:
u40772@s001-n088:~/cmt-fpga/pca$ dpcpp -fintelfpga CMT-bone-pca.dp.cpp -DFPGA_EMULATOR=1 -o cmt.out
TBB Warning: The number of workers is currently limited to 23. The request for 31 workers is ignored. Further requests for more workers will be silently ignored until the limit changes.
HOST MESSAGE : Memory Allocation took, 0.00456611 seconds
Max work group size: 4100
Native API failed. Native API returns: -42 (CL_INVALID_BINARY) -42 (CL_INVALID_BINARY)Exception caught at file:CMT-bone-pca.dp.cpp, line:650
As shown, the FPGA compile for emulator completes without error or warning, but execution of the output gives an invalid binary error that was caught in the code block that should call the accelerator device.
The only thing I could find with this specific error is here: https://community.intel.com/t5/Intel-High-Level-Design/CL-INVALID-BINARY-when-running-fast-recompile... which suggests that it is related to the hardware target for compilation not matching up with available resources. However, since I am targeting the FPGA emulator, I would think just the CPU device would be necessary, although I am trying this on an Arria 10 node, so that should be available too.
Any suggestions? Thanks
I have attached the original CUDA as well as the converted DPC++ code. (sorry for the messiness)
I am wondering if the dpct::get_current_device() is finding the Arria 10 FPGA rather than the FPGA emulator during compilation. However, I haven't been able to find great documentation on the priority of this function. The compilation actually failed when I tried it off the FPGA node, so maybe this makes sense, but I would think the compilation would take significantly longer if it was targeting a physical FPGA.
Please let me know if you have any suggestions, thanks!
Oops... guess I forgot the -r. I have reattached the files.
In general though, this appears to be some kind of error with the requested vs available devices. Is there any documentation on the oneAPI get_current_device() function? I have not been able to find much that goes into its selection priority.
Thanks for the reproducer, we are working on your issue.
>>Is there any documentation on the oneAPI get_current_device() function?
Please refer to the below link for the documentation DPCT.
Have a Good day!
Thanks & Regards
Yes, I have seen that guide already, unfortunately it is not very informative.
I see that by default, the converted code uses the dpct::get_current_device() function in order to select its target, but there is no explanation as to how that function prioritizes its choice when there are potentially multiple different targets (e.g., CPU, FPGA, FPGA emulator). I also see that the dev_mgr can change the current device using select_device(), but there is no explanation on how to actually use this function.
Further documentation on how to properly select FPGA and FPGA emulator target devices with DPC++ would be much appreciated. Or if this invalid binary error has nothing to do with device selection that would be good to know. Thanks!