Intel® High Level Design
Support for Intel® High Level Synthesis Compiler, DSP Builder, OneAPI for Intel® FPGAs, Intel® FPGA SDK for OpenCL™
654 Discussions

Performance difference between OpenCL 18.1 Std and Pro for FPGA ?

DongWang-BJTU
New Contributor I
1,940 Views

I was compiling the same kernel code by using both v18.1 std and pro. For the standard version, I could achieve a fmax around 220 MHz, but for pro version, the fmax is only 190 MHz.

 

I further compared the report.html, and I found that loop dependence is found in the pro version report, but in the standard version everything is OK.

 

This does not happen when I was using v18.0 and older versions. What have changed in v18.1 ?

 

0 Kudos
7 Replies
MuhammadAr_U_Intel
1,099 Views
Hi, Is there any specific example you are using for comparison ? If the example is from OpenCL examples provided by Intel, I can try it out on my end. Thanks, Arslan
0 Kudos
MuhammadAr_U_Intel
1,099 Views

Hi,

 

Is there any specific example you are using for comparison ? 

If the example is from OpenCL examples provided by Intel, I can try it out on my end.

 

Thanks,

Arslan

0 Kudos
DongWang-BJTU
New Contributor I
1,099 Views

Sorry, I can not post the whole kernel code here, too many lines. Here's some results that can be seen directly:

 

For 18.1 std, the following code are succefully pipelined with II=1, with no warning:

1.JPG

 

 

But for 18.1 Pro, a fmax warning is shown in report.html as can be seen here:

2.JPG

 

dependency is found on variable find_idle_ch_id here:

3.JPG

 

 

For this reason, a fmax=190MHz is generated for a10 device, while for s5 device the fmax is 220MHz.

 

For my understanding, a10 is more a advanced device than s5, and should run higher frequence than stratix-v.

 

0 Kudos
Dr_FPGA
Novice
1,099 Views

Keep in mind that S5 was the "top of the line" FPGA not so long ago. I think the reason for Pro version existence are different Gen10 devices with diffent metal routing and I/O columns in the middle of the die. Routing across I/O columns and around congested areas typically the main reasons for extra tPD and lower frequency in A10 vs S5.

0 Kudos
DongWang-BJTU
New Contributor I
1,099 Views

Yes, s5 used to be the high-end product. But a10 has more advanced silicon technology.

 

This report is generated in the first compilation stage, while no P&R has been carried out, so routing should not be a problem.

0 Kudos
DongWang-BJTU
New Contributor I
1,099 Views

Another odd thing is that sometimes 18.1 Pro generates unreasonable registers for private variables as follow:

 

The variable table_p2s_prefechtor is actually 16-bit width (unsigned short), but the compiler make it 512-bit wide, this makes feedback logics in-efficient.

 

4.JPG

For 18.1 std version, there is no problem:

5.JPG

0 Kudos
MuhammadAr_U_Intel
1,099 Views
Could you give it a try with latest OpenCL compiler release 19.1, if problem persists you may share the kernel codes and steps to replicate the issue in private message and I can help to feedback this to Engineering team. Thanks, Arslan
0 Kudos
Reply