Intel® High Level Design
Support for Intel® High Level Synthesis Compiler, DSP Builder, OneAPI for Intel® FPGAs, Intel® FPGA SDK for OpenCL™
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
539 Discussions

Performance difference between OpenCL 18.1 Std and Pro for FPGA ?

DongWang-BJTU
New Contributor I
1,285 Views

I was compiling the same kernel code by using both v18.1 std and pro. For the standard version, I could achieve a fmax around 220 MHz, but for pro version, the fmax is only 190 MHz.

 

I further compared the report.html, and I found that loop dependence is found in the pro version report, but in the standard version everything is OK.

 

This does not happen when I was using v18.0 and older versions. What have changed in v18.1 ?

 

0 Kudos
7 Replies
MuhammadAr_U_Intel
444 Views
Hi, Is there any specific example you are using for comparison ? If the example is from OpenCL examples provided by Intel, I can try it out on my end. Thanks, Arslan
MuhammadAr_U_Intel
444 Views

Hi,

 

Is there any specific example you are using for comparison ? 

If the example is from OpenCL examples provided by Intel, I can try it out on my end.

 

Thanks,

Arslan

DongWang-BJTU
New Contributor I
444 Views

Sorry, I can not post the whole kernel code here, too many lines. Here's some results that can be seen directly:

 

For 18.1 std, the following code are succefully pipelined with II=1, with no warning:

1.JPG

 

 

But for 18.1 Pro, a fmax warning is shown in report.html as can be seen here:

2.JPG

 

dependency is found on variable find_idle_ch_id here:

3.JPG

 

 

For this reason, a fmax=190MHz is generated for a10 device, while for s5 device the fmax is 220MHz.

 

For my understanding, a10 is more a advanced device than s5, and should run higher frequence than stratix-v.

 

Dr_FPGA
Novice
444 Views

Keep in mind that S5 was the "top of the line" FPGA not so long ago. I think the reason for Pro version existence are different Gen10 devices with diffent metal routing and I/O columns in the middle of the die. Routing across I/O columns and around congested areas typically the main reasons for extra tPD and lower frequency in A10 vs S5.

DongWang-BJTU
New Contributor I
444 Views

Yes, s5 used to be the high-end product. But a10 has more advanced silicon technology.

 

This report is generated in the first compilation stage, while no P&R has been carried out, so routing should not be a problem.

DongWang-BJTU
New Contributor I
444 Views

Another odd thing is that sometimes 18.1 Pro generates unreasonable registers for private variables as follow:

 

The variable table_p2s_prefechtor is actually 16-bit width (unsigned short), but the compiler make it 512-bit wide, this makes feedback logics in-efficient.

 

4.JPG

For 18.1 std version, there is no problem:

5.JPG

MuhammadAr_U_Intel
444 Views
Could you give it a try with latest OpenCL compiler release 19.1, if problem persists you may share the kernel codes and steps to replicate the issue in private message and I can help to feedback this to Engineering team. Thanks, Arslan
Reply