- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greetings,
We are running some experiments and we have detected that oneAPI for FGPA is not performing well, quite below the previous pure OpenCL implementation (same system, fpga, etc). We tested with different codes, and in even a simple Matrix Multiplication the OpenCL code/runtime performs much better than in a pure oneAPI one. So, we thought about doing OpenCL interoperability with oneAPI (as in this guide https://www.intel.com/content/www/us/en/developer/articles/technical/interoperability-dpcpp-sycl-opencl.html), to see if DPC++ can avoid all the overhead/wrong optimization steps of oneAPI by using something closer to the OpenCL implementation/runtime optimizations. We don't know what oneAPI is adding during the compilation/execution steps, but now using the FPGA is not worthwhile.
The fpga emulator works with the interoperability, but the real fpga does not. Here are our steps:
Error
user@node:~/interoperability$ ./ingest_fpga
Device: s10gx : Stratix 10 Reference Platform (acls10_ref0)
terminate called after throwing an instance of 'cl::sycl::feature_not_supported'
what(): Online compilation is not supported by this device -3 (CL_COMPILER_NOT_AVAILABLE)
Code
#include <CL/sycl.hpp>
#include <iostream>
#include <array>
#include <CL/sycl/INTEL/fpga_extensions.hpp>
#include "dpc_common.hpp"
using namespace cl::sycl;
int main()
{
const size_t szKernelData = 32;
std::array<float, szKernelData> kernelData;
kernelData.fill(-99.f);
range<1> r(szKernelData);
sycl::INTEL::fpga_selector _device;
queue q{_device};
//queue q{fpga_selector()};
std::cout << "Device: " << q.get_device().get_info<info::device::name>() << "\n";
program p(q.get_context());
p.build_with_source(
R"CLC(
kernel void sinf_test(global float* data) {
data[get_global_id(0)] = sin(get_global_id(0)*2*M_PI_F/get_global_size(0)) ;
}
)CLC", "-cl-std=CL1.2");
{
buffer<float, 1> b(kernelData.data(), r);
q.submit([&](handler& cgh) {
auto b_accessor = b.get_access<access::mode::read_write>(cgh);
cgh.set_args(b_accessor);
cgh.parallel_for(r, p.get_kernel("sinf_test"));
});
}
for(auto& elem : kernelData)
std::cout << std::defaultfloat << elem << " " << std::hexfloat << elem << std::endl;
return 0;
}
Compilation
dpcpp -fintelfpga -Xshardware -fsycl-unnamed-lambda ingest.cpp -std=c++17 -o ingest_fpga
# Output
WARNING: No kernels specified.
Any idea?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @RN1,
Thank you for posting in Intel community forum on your interest in oneAPI and hope all is well.
While we are looking into the interoperability error, question if I may, would it be possible to share the mention test codes that you have running for both pure OpenCl and OneAPI which will allow us to check further.
I have dropped you a note in your mailbox, if desire a more secure way to share the codes.
Hope to hear from you soon.
Best Wishes
BB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @RN1,
Good day, just following up on the previous clarification.
By any chances did you managed to look into the it?
Best Wishes
BB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good day.
I answered in the other thread.
We did more tests and nothing, the performance is not good and the results are the same as those attached in the other post.
We compared "System viewers->Graph viewer (beta)" in openCL with "Views->System viewer" in oneAPI. OpenCL is doing coalescing to local accesses, while oneAPI it isn't. It uses many small units. We assume that regarding the DSPs something is bad as well, but it is complex to see since the generated graph is huge. Maybe it is not reducing/trimming the tree, but we did nothing different from OpenCL, so, we need support in how to optimize these local accesses to achieve at least similar performance than in OpenCL (nothing is explained in the optimization guides).
Following this thread, Do you know why interoperability OpenCL-oneAPI does not work with the fpga?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @RN1,
Noted on the steps taken for the performance clarification, and also this thread seems to be related to the error on executing in DevCloud which is resolved due to the node issues, hence would close this thread and follow up on the performance clarification on another thread itself as below:
Pleasure having you here.
Best Wishes
BB
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page