Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, DLA, Software Stack, and Reference Designs
Announcements
The Intel sign-in experience is changing in February to support enhanced security controls. If you sign in, click here for more information.
436 Discussions

OpenCL to OneAPI Dramatic change in Frequency and Performance

AbenezerWudenhe
Beginner
577 Views

Hello,

 

I am attempting to reimplement an FPGA GEMM program from OpenCL to OneAPI.

This however results in a very low frequency and high increase in Memory Bits after full FPGA synthesis. Can anyone provide some areas to look into that may be causing this? I believe my implementation follows the OpenCL implementation very closely. I have included the source kernel codes below with .cl extension for OpenCL and .h for OneAPI. Thank you for the help.

 

Sincerely,

Abenezer Wudenhe

0 Kudos
5 Replies
HongboRong
Beginner
548 Views

I compared the two files:

     kdiff3 a.cl gemm.generated_oneapi_header.h

One suspicious place is that OpenCL code uses vload16, but OneAPI does not: 

OpenCL (a.cl, line 116):

float16 _29 = vload16(0, (__address_space__A_serializer float*)_A_serializer + _28);

OneAPI (gemm.generated_oneapi_header.h, 2755):

_158 = {
_A_serializer[_157+0],
_A_serializer[_157+1],
_A_serializer[_157+2],
_A_serializer[_157+3],
_A_serializer[_157+4],
_A_serializer[_157+5],
_A_serializer[_157+6],
_A_serializer[_157+7],
_A_serializer[_157+8],
_A_serializer[_157+9],
_A_serializer[_157+10],
_A_serializer[_157+11],
_A_serializer[_157+12],
_A_serializer[_157+13],
_A_serializer[_157+14],
_A_serializer[_157+15]
};

Another suspicious place is the use of fpga_reg with float16: 

 OpenCL (a.cl 193):

float16 _61 = __fpga_reg(__fpga_reg(_60));

OneAPI (gemm.generated_oneapi_header.h, 2852):

float16 _190 = (float16){
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[0])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[1])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[2])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[3])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[4])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[5])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[6])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[7])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[8])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[9])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[10])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[11])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[12])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[13])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[14])),
sycl::ext::intel::fpga_reg( sycl::ext::intel::fpga_reg(_189[15]))
};

 

Anyone knows how to (1) use vload16 in OneAPI? (2) use fpga_reg for float16 type?

 

Thanks!

BoonBengT_Intel
Moderator
490 Views

Hi @AbenezerWudenhe,

 

Thank you for posting in Intel community forum, hope all is well and apologies for the delayed in response.
We do have oneAPI libraries available for handling matrixes, would suggest to refer to the resource here to get your started.
Here are also an example which serve as a good starting point.
Hope that clarify.

Best Wishes
BB

BoonBengT_Intel
Moderator
475 Views

Hi @AbenezerWudenhe,

Good day, just checking in to see if there is any further doubts in regards to this matter.
Hope we have clarify your doubts.

Best Wishes
BB

BoonBengT_Intel
Moderator
454 Views

Hi @AbenezerWudenhe,

Greetings, as we do not receive any further clarification on what is provided, we would assume challenge are resolved. Hence thread will no longer be monitored. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.

Best Wishes
BB

AbenezerWudenhe
Beginner
446 Views

Hello @BoonBengT_Intel ,

 

I apologize for the late response.

Unfortunately, this is a domain specific program we are trying to create so using a oneAPI libraries for handling matrixes is not an option to my understanding. What we are trying to figure out is why the performance has dropped so significantly when converting our source code from OpenCL to OneAPI. If you are still available to help, it would be much appreciated as we are still struggling to find the source of the issue. If not and I must create a new post, please let me know. Thank you.

 

Sincerely,

Abenezer Wudenhe 

Reply