Intel® High Level Design
Support for Intel® High Level Synthesis Compiler, DSP Builder, OneAPI for Intel® FPGAs, Intel® FPGA SDK for OpenCL™
Announcements
All support for Intel NUC 7 - 13 systems has transitioned to ASUS. Read latest update.
651 Discussions

Error executing basic test code for FPGA (oneAPI/OpenCL)

RN1
New Contributor I
652 Views

Greetings,

 

We are running some experiments and we have detected that oneAPI for FGPA is not performing well, quite below the previous pure OpenCL implementation (same system, fpga, etc). We tested with different codes, and in even a simple Matrix Multiplication the OpenCL code/runtime performs much better than in a pure oneAPI one. So, we thought about doing OpenCL interoperability with oneAPI (as in this guide https://www.intel.com/content/www/us/en/developer/articles/technical/interoperability-dpcpp-sycl-opencl.html), to see if DPC++ can avoid all the overhead/wrong optimization steps of oneAPI by using something closer to the OpenCL implementation/runtime optimizations. We don't know what oneAPI is adding during the compilation/execution steps, but now using the FPGA is not worthwhile.

The fpga emulator works with the interoperability, but the real fpga does not. Here are our steps:

 

Error

 

 

user@node:~/interoperability$ ./ingest_fpga
Device: s10gx : Stratix 10 Reference Platform (acls10_ref0)
terminate called after throwing an instance of 'cl::sycl::feature_not_supported'
  what():  Online compilation is not supported by this device -3 (CL_COMPILER_NOT_AVAILABLE)

 

 

 

Code

 

 

#include <CL/sycl.hpp>
#include <iostream>
#include <array>

#include <CL/sycl/INTEL/fpga_extensions.hpp>
#include "dpc_common.hpp"

using namespace cl::sycl;

int main()
{
        const size_t szKernelData = 32;
        std::array<float, szKernelData> kernelData;
        kernelData.fill(-99.f);
        range<1> r(szKernelData);
        sycl::INTEL::fpga_selector _device;
        queue q{_device};
        //queue q{fpga_selector()};
        std::cout << "Device: " << q.get_device().get_info<info::device::name>() << "\n";
        program p(q.get_context());

        p.build_with_source(
          R"CLC(
            kernel void sinf_test(global float* data) {
              data[get_global_id(0)] = sin(get_global_id(0)*2*M_PI_F/get_global_size(0)) ;
            }
          )CLC", "-cl-std=CL1.2");
        {
                buffer<float, 1> b(kernelData.data(), r);

                q.submit([&](handler& cgh) {
                        auto b_accessor = b.get_access<access::mode::read_write>(cgh);
                        cgh.set_args(b_accessor);
                        cgh.parallel_for(r, p.get_kernel("sinf_test"));
                });
        }
        for(auto& elem : kernelData)
                std::cout << std::defaultfloat << elem << " " << std::hexfloat << elem << std::endl;
        return 0;
}

 

 

 

Compilation

 

 

dpcpp -fintelfpga -Xshardware -fsycl-unnamed-lambda ingest.cpp -std=c++17 -o ingest_fpga
# Output
WARNING: No kernels specified.

 

 

 

 

Any idea?

0 Kudos
4 Replies
BoonBengT_Intel
Moderator
546 Views

Hi @RN1,


Thank you for posting in Intel community forum on your interest in oneAPI and hope all is well.

While we are looking into the interoperability error, question if I may, would it be possible to share the mention test codes that you have running for both pure OpenCl and OneAPI which will allow us to check further.


I have dropped you a note in your mailbox, if desire a more secure way to share the codes.

Hope to hear from you soon.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
520 Views

Hi @RN1,


Good day, just following up on the previous clarification.

By any chances did you managed to look into the it?


Best Wishes

BB


0 Kudos
RN1
New Contributor I
503 Views

Good day.

I answered in the other thread.

We did more tests and nothing, the performance is not good and the results are the same as those attached in the other post.

We compared "System viewers->Graph viewer (beta)" in openCL with "Views->System viewer" in oneAPI. OpenCL is doing coalescing to local accesses, while oneAPI it isn't. It uses many small units. We assume that regarding the DSPs something is bad as well, but it is complex to see since the generated graph is huge. Maybe it is not reducing/trimming the tree, but we did nothing different from OpenCL, so, we need support in how to optimize these local accesses to achieve at least similar performance than in OpenCL (nothing is explained in the optimization guides).

Following this thread, Do you know why interoperability OpenCL-oneAPI does not work with the fpga?

0 Kudos
BoonBengT_Intel
Moderator
481 Views

Hi @RN1,


Noted on the steps taken for the performance clarification, and also this thread seems to be related to the error on executing in DevCloud which is resolved due to the node issues, hence would close this thread and follow up on the performance clarification on another thread itself as below:

https://community.intel.com/t5/forums/forumtopicpage/board-id/high-level-design/message-id/2395#M2395


Pleasure having you here.

Best Wishes

BB


0 Kudos
Reply