Showing results for 
Search instead for 
Did you mean: 
Honored Contributor I

sharing of local_memory between Work Items on SoC FPGA (Cyclone V)

According to OpenCL ,  

1. __local address space inside a __kernel function are allocated for each work-group executing the kernel. 

2. variables that need to be allocated in local memory and are shared by all work-items of a work-group. 


for the following kernel code for image of 512 *512 pixels 

# define W 512# define H 512# define global_size_x 512# define global_size_y 512# define local_size_x 512# define local_size_y 1 


__attribute__((reqd_work_group_size(local_size_x,local_size_y,1))) //dimensions 


__kernel void sobel_kernel (__global unsigned char * restrict image_in, 

__global unsigned char * restrict image_out) 


__local int n;  

int sum;  


//Index of the pixel 

__private short int row_id = get_global_id(1); 

__private short int col_id = get_local_id(0); 


sum = image_in[(row_id )*W + (col_id )] ; //read global to local 

n=n+1; //update local value 





if((row_id <10) && (col_id <10)) 

printf("\n%d",n); //observe local value 


//global mem write transaction  

image_out[(row_id)*W + (col_id) ] = sum; 


the above kernel when compiled on  

1. emulator in linux (default s5_ref board) the output printed for n was n = 1 2 3 4 5..... 

2. but when I compiled the kernel with Intel FPGA SDK and deployed .aocx file on the Cyclone V SoC FPGA the values for n were always 1 1 1 1 1...../ 


Can someone explain why the local variable declared doesn't have scope for all the work items in the work group. 

and why emulator and FPGA may show different results. 


0 Kudos
0 Replies