- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all and thanks in advance for any help that you could provide.
It seems that the problem with which I opened this thread is still unsolved, and we have just found a modification that seems to narrow down the problem.
We've been playing with the following code that is one of the examples available in the oneAPI training material:
using namespace sycl;
int main()
{
{
range<1> r{SIZE};
#ifdef FPGA_EMULATOR
INTEL::fpga_emulator_selector device_selector;
#else
INTEL::fpga_selector device_selector;
#endif
queue q{device_selector};
queue q_cpu{cpu_selector{}};
buffer<int, 1> a_buf{r};
buffer<int, 1> b_buf{r};
buffer<int, 1> c_buf{r};
// a ---- c --- d
// b __/
q.submit([&](handler& h) {
accessor a(a_buf, h, write_only);
h.parallel_for(r, [=](auto idx) {
a[idx] = idx; });
});
q.submit([&](handler& h) {
accessor b(b_buf, h, write_only);
h.parallel_for(r, [=](auto idx) {
b[idx] = -idx; });
});
q_cpu.submit([&](handler& h) { //fails with q_cpu, but not with q
accessor a(a_buf, h, read_only);
accessor b(b_buf, h, read_only);
accessor c(c_buf, h, write_only);
h.parallel_for(r, [=](auto idx) {
c[idx] = a[idx] + b[idx]; });
});
q.submit([&](handler& h) {
accessor c(c_buf, h, read_write);
h.parallel_for(r, [=](auto idx) {
c[idx] += 1; });
}).wait();
}
std::cout << "DONE.\n";
return 0;
}
As you can see in the comment of the 3rd kernel submission, submitting in the same code to the FPGA and the CPU at the same time and expecting the runtime to solve the data flow dependencies fails with the following message:
u32284@s001-n081:~/oneTBB/examples/SC20/lab$ dpcpp -fintelfpga vector-add-fpga.cpp -DFPGA_EMULATOR -o vadd.emu
u32284@s001-n081:~/oneTBB/examples/SC20/lab$ ./vadd.emu
terminate called after throwing an instance of 'cl::sycl::runtime_error'
what(): Native API failed. Native API returns: -42 (CL_INVALID_BINARY) -42 (CL_INVALID_BINARY)
Aborted
If we change the second kernel so that we avoid submitting to the CPU device, the code does not return:
// change q_cpu.submit()... by this:
host_accessor a(a_buf, read_only);
host_accessor b(b_buf, read_only);
host_accessor c(c_buf, write_only);
for(int idx=0; idx<SIZE;idx++){
c[idx] = a[idx] + b[idx];
}
And the only way to get it works, as far as we know, is by destroying the host_accessors:
{
host_accessor a(a_buf, read_only);
host_accessor b(b_buf, read_only);
host_accessor c(c_buf, write_only);
for(int idx=0; idx<SIZE;idx++){
c[idx] = a[idx] + b[idx];
}
}
u32284@s001-n081:~/oneTBB/examples/SC20/lab$ dpcpp -fintelfpga vector-add-fpga3.cpp -DFPGA_EMULATOR -o vadd.emu
u32284@s001-n081:~/oneTBB/examples/SC20/lab$ ./vadd.emu
DONE.
u32284@s001-n081:~/oneTBB/examples/SC20/lab$
Has this been reported before? Are we doing something wrong or is the compiler/runtime that still needs some improvements?
Thanks once again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
This issue of CL_INVALID_BINARY when compiling with the -fintelfpga flag while trying to run on CPU + FPGA can be solved by using the following flag in place of -fintelfpga: -fsycl-targets=spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice. The full command would then be: dpcpp -fsycl-targets=spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice vadd.cpp -o vadd.emu
This stems from the fact that the -fintelfpga flag instructs the compiler to perform an offline/AOT compilation which specifically targets an fpga device only (similar to CPU offline compilation). You can see more on targeting multiple platforms here, which gives examples for FPGA emulation/hardware compiles with separate source files for the CPU and FPGA kernels.
Best regards.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
This issue of CL_INVALID_BINARY when compiling with the -fintelfpga flag while trying to run on CPU + FPGA can be solved by using the following flag in place of -fintelfpga: -fsycl-targets=spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice. The full command would then be: dpcpp -fsycl-targets=spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice vadd.cpp -o vadd.emu
This stems from the fact that the -fintelfpga flag instructs the compiler to perform an offline/AOT compilation which specifically targets an fpga device only (similar to CPU offline compilation). You can see more on targeting multiple platforms here, which gives examples for FPGA emulation/hardware compiles with separate source files for the CPU and FPGA kernels.
Best regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi George,
It works perfect now!
Thank you very much.
Best regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you George!
It actually works in emulation mode. I also compiled the final bitstream as explained below, but at runtime I got this error message:
terminate called after throwing an instance of 'cl::sycl::feature_not_supported'
what(): SPIR-V online compilation is not supported in this context -59 (CL_INVALID_OPERATION)
/var/spool/torque/mom_priv/jobs/784102.v-qsvr-1.aidevcloud.SC: line 11: 15844 Aborted bin/fpga-hw
I used this script, compile_fpga_hw.sh, to compile:
source /opt/intel/inteloneapi/setvars.sh
rm -rf bin/fpga-hw.o
#/bin/echo "##" $(whoami) is compiling DPC++ example for FPGA hardware
dpcpp -fsycl-targets=spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice -c lab/vector-add-fpga.cpp -o bin/fpga-hw.o
dpcpp -fsycl-targets=spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice bin/fpga-hw.o -Xshardware -Xsboard=intel_a10gx_pac:pac_a10 -Xsparallel=8 -o bin/fpga-hw
And launched with this other script, just in case anyone want to reproduce it:
qsub -l nodes=1:ppn=2:fpga_compile -d . scripts/compile_fpga_hw.sh
According to the Programming Guide, two different files, one for the FPGA and a different one for the CPU, should be compiled separately and linked afterwards. I'd rather avoid this extra burden if at all possible, so I wonder what would be the compilation flags to get it running using a single source file.
Thanks once again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good to see the discussions. Please open a new case for any similar issues.
Thanks and Regards
Anil

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page