Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17255 Discussions

Kernel Vectorization query

Altera_Forum
Honored Contributor II
1,540 Views

Hi, 

I am trying to incorporate the kernel vectorization optimization  

I get the following compiler warning  

Compiler Warning: Kernel is vectorized but there exist loads/stores that cannot be vectorized. This may reduce performance. 

The following are the details: 

Global thread dimension: 240 x 540 

Local Work grp dimension 240 x 1 

Input dimension 1920 x 1080 

I used following attributes 

_attribute__((num_simd_work_items(4))) 

__attribute__((reqd_work_group_size(240,1,1))) 

 

input loading code snippet : 

for(UInt32 i = 0 ; i < 8; i++) 

tempin[lidx + i * 240] = input[lidx + i * 240]; 

where 

lidx: local_work_id in x direction with max val as 239 (since Local wrg grp dim 240 x 1)  

tempin is a local memory buffer which is used for per workgrp computation 

 

Can anyone suggest way to avoid this warning.............? 

Let me know if I have to furnish any more details ...... 

 

 

 

Thanks  

Neelakandan
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
852 Views

reqd_work_group_size might be set to be the power of 2, 240 is not normal.

0 Kudos
Altera_Forum
Honored Contributor II
852 Views

Hi 

 

Even after specifying the required work group size as a power of 2 (Instead of 240 I specified it as 256), I get the same warning message... 

Can there be any other reason ? 

 

 

Thanks
0 Kudos
Altera_Forum
Honored Contributor II
852 Views

It is about "lidx + i * 240", the AOC cannot analyze them effectively, which leads to suboptimal performance.  

 

You may try the "#prama unroll" before the for loop.
0 Kudos
Reply