Kernel Vectorization query

Altera_Forum · ‎01-03-2014

Hi,

I am trying to incorporate the kernel vectorization optimization

I get the following compiler warning

Compiler Warning: Kernel is vectorized but there exist loads/stores that cannot be vectorized. This may reduce performance.

The following are the details:

Global thread dimension: 240 x 540

Local Work grp dimension 240 x 1

Input dimension 1920 x 1080

I used following attributes

_attribute__((num_simd_work_items(4)))

__attribute__((reqd_work_group_size(240,1,1)))

input loading code snippet :

for(UInt32 i = 0 ; i < 8; i++)

{

tempin[lidx + i * 240] = input[lidx + i * 240];

}

where

lidx: local_work_id in x direction with max val as 239 (since Local wrg grp dim 240 x 1)

tempin is a local memory buffer which is used for per workgrp computation

Can anyone suggest way to avoid this warning.............?

Let me know if I have to furnish any more details ......

Thanks

Neelakandan

Altera_Forum · ‎01-03-2014

reqd_work_group_size might be set to be the power of 2, 240 is not normal.

Altera_Forum · ‎01-05-2014

Hi

Even after specifying the required work group size as a power of 2 (Instead of 240 I specified it as 256), I get the same warning message...

Can there be any other reason ?

Thanks

Altera_Forum · ‎01-06-2014

It is about "lidx + i * 240", the AOC cannot analyze them effectively, which leads to suboptimal performance.

You may try the "#prama unroll" before the for loop.