Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16597 Discussions

[Kernel vectorization] Loads/Stores cannot be vectorized

Altera_Forum
Honored Contributor II
1,177 Views

Hi guys! 

 

The compiler says this on the above code: 

 

Compiler Warning: Vectorized kernel contains loads/stores that cannot be vectorized. This might reduce performance. 

 

 

What loads/stores are causing this? 

 

 

 

 

--- Quote Start ---  

 

__attribute__((num_simd_work_items(16))) 

__attribute__((reqd_work_group_size(32,1,1))) 

__kernel void int_loop(__global const short * restrict a, 

__constant const bool * restrict b, 

__global int * restrict group_counters, 

__local int * restrict local_counter, 

const int base, const int base2) 

__private uint local_id = get_local_id(0); 

__private uint group_size = get_local_size(0); 

__private uint global_id = get_global_id(0); 

__private int g1 = a[base + global_id]; 

__private int comparison; 

__private int wi_counter; 

 

 

 

 

comparison = (g1 == a[base2 + global_id]); 

 

 

wi_counter = comparison & b[global_id] & (g1 != 0); 

 

 

local_counter[local_id] = wi_counter; 

 

 

 

 

for(uint stride = group_size >> 1; stride > 0; stride = stride >> 1) 

barrier(CLK_LOCAL_MEM_FENCE); 

 

 

if(local_id < stride) 

local_counter[local_id] += local_counter[local_id + stride]; 

 

 

if(local_id == 0) 

group_counters[get_group_id(0)] = local_counter[0]; 

 

 

 

--- Quote End ---  

0 Kudos
2 Replies
Altera_Forum
Honored Contributor II
321 Views

The likely case is the following part: 

 

if(local_id == 0) group_counters = local_counter; 

 

SIMD vectorizes operations at work-item level, and coalesces accesses that are consecutive at this level. For this particular access, since the write is done only by one work-item per work-group, it cannot be vectorized/coalesced.
0 Kudos
Altera_Forum
Honored Contributor II
321 Views

 

--- Quote Start ---  

The likely case is the following part: 

 

if(local_id == 0) group_counters = local_counter; 

 

SIMD vectorizes operations at work-item level, and coalesces accesses that are consecutive at this level. For this particular access, since the write is done only by one work-item per work-group, it cannot be vectorized/coalesced. 

--- Quote End ---  

 

 

Always right mate :) 

 

Thanks HRZ
0 Kudos
Reply