Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
1,662 Views

Data dependency caused by conditional global memory read

Hi, 

 

When I compile my code, in the loop analysis of the generated report file there's one loop which has iteration interval of 4, which means it's not pipelined well.  

 

for (uint loop_cnt = 0, w = 0, cp = 0; loop_cnt < COLS_PER_PE * TILE_SIZE; loop_cnt++) { //load data to local buffer uint x = (n * BLOCK_SIZE + cp * TILE_SIZE + w) % conv_dim1 + i; uint y = (n * BLOCK_SIZE + cp * TILE_SIZE + w) / conv_dim1 + j; if ((cp * COLS_PER_PE + w > BLOCK_SIZE - 1 - col_pad_size && n == (conv_dim1 * conv_dim2 + col_pad_size) / BLOCK_SIZE - 1) || (x < pad_size || x > data_dim1 - pad_size - 1 || y < pad_size || y > data_dim2 - pad_size - 1)) { # pragma unroll for(uint v = 0; v < CVEC; v++) { data_double_buf = 0.0f; } } else { data_double_buf = input; } //load weight to local buffer if(cp * TILE_SIZE + w < BLOCK_SIZE - row_pad_size) { //For the first 2 convolutional layers if(conv_dim3 < BLOCK_SIZE) { weight_double_buf = weight; } //For the last convolutional layer else { weight_double_buf = weight; } } else { if(conv_dim3 < BLOCK_SIZE) { # pragma unroll for(uint v = 0; v < CVEC; v++) { weight_double_buf.vector = 0.0f; } } } //manual loop coalescing if(w == TILE_SIZE - 1) { cp += 1; } if(w == TILE_SIZE - 1) { w = 0; } else { w += 1; } }  

 

And here is the report about this loop 

 

pipelined II Bottleneck detail Block7 (conv.cl:107) Yes 4 II Memory dependency Block7: II bottleneck due to memory dependency between: Store Operation (conv.cl:122) Store Operation (conv.cl:122) Largest critical path contributor(s): 36%: Store Operation (conv.cl:122) 36%: Store Operation (conv.cl:122)  

 

 

I don't see any data dependency here. if the compiler is inferring wrongly, does any one know how to avoid this? (if I make "weight_double_buf" and "data_double_buf" normal float or remove the conditions, the II will become 1) 

 

And advice would be greatly appreciated! 

Lancer
0 Kudos
5 Replies
Altera_Forum
Honored Contributor I
33 Views

Which line is line 122 in your code? False dependencies on "global" buffers can be avoided by adding# pragma ivdep array(*buffer_name*) before the loop (Best practices guide, Section 5.2). Note that incorrect use of this pragma WILL result in incorrect output.

Altera_Forum
Honored Contributor I
33 Views

Hi HRZ, 

 

Thanks for your reply. The report means there are dependency between line 

 

"data_double_buf[wr_bank_sel][cp][w] = 0.0f;" and line 

"data_double_buf[wr_bank_sel][cp][w] = input[h * input_dim1 * input_dim2 + (y - pad_size) * input_dim1 + x - pad_size];" 

 

which belongs to two different conditional branches. 

Is it global memory dependency or local memory dependency?
Altera_Forum
Honored Contributor I
33 Views

Since it is a "store" dependency, it is probably the local memory one (data_double_buf). You can try writing the output to a temporary register, and then writing back the value of that register to the local buffer "outside" of the if/else block to see if it removes the dependency. 

 

By the way, why do you need the unrolled for loop here? The statement inside of the loop does not depend on the loop variable. I think you have a typo here. 

 

#pragma unroll for(uint v = 0; v < CVEC; v++) { data_double_buf = 0.0f; }
Altera_Forum
Honored Contributor I
33 Views

Hi HRZ,  

 

Thanks for your reply! 

Yes I had a typo there (that buffer is a data structure, it should be "data_double_buf[wr_bank_sel][cp][w].vector[v] = 0.0f;" Thanks for pointing out. 

 

Is there any pragma that can remove false local memory dependency like# pragma ivdep? (Not for this problem)
Altera_Forum
Honored Contributor I
33 Views

I have never seen the compiler falsely detecting a dependency on local memory accesses and I highly doubt that is even possible. You can always try using "#pragma ivdep" also for local memory dependencies, but I don't think it will have any effect.