- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
When I compile my code, in the loop analysis of the generated report file there's one loop which has iteration interval of 4, which means it's not pipelined well.
for (uint loop_cnt = 0, w = 0, cp = 0; loop_cnt < COLS_PER_PE * TILE_SIZE; loop_cnt++) {
//load data to local buffer
uint x = (n * BLOCK_SIZE + cp * TILE_SIZE + w) % conv_dim1 + i;
uint y = (n * BLOCK_SIZE + cp * TILE_SIZE + w) / conv_dim1 + j;
if ((cp * COLS_PER_PE + w > BLOCK_SIZE - 1 - col_pad_size && n == (conv_dim1 * conv_dim2 + col_pad_size) / BLOCK_SIZE - 1) || (x < pad_size || x > data_dim1 - pad_size - 1 || y < pad_size || y > data_dim2 - pad_size - 1)) {
# pragma unroll
for(uint v = 0; v < CVEC; v++) {
data_double_buf = 0.0f;
}
}
else {
data_double_buf = input;
}
//load weight to local buffer
if(cp * TILE_SIZE + w < BLOCK_SIZE - row_pad_size) {
//For the first 2 convolutional layers
if(conv_dim3 < BLOCK_SIZE) {
weight_double_buf = weight;
}
//For the last convolutional layer
else {
weight_double_buf = weight;
}
}
else {
if(conv_dim3 < BLOCK_SIZE) {
# pragma unroll
for(uint v = 0; v < CVEC; v++) {
weight_double_buf.vector = 0.0f;
}
}
}
//manual loop coalescing
if(w == TILE_SIZE - 1) {
cp += 1;
}
if(w == TILE_SIZE - 1) {
w = 0;
}
else {
w += 1;
}
}
And here is the report about this loop
pipelined II Bottleneck detail
Block7 (conv.cl:107)
Yes
4
II
Memory dependency
Block7:
II bottleneck due to memory dependency between:
Store Operation (conv.cl:122)
Store Operation (conv.cl:122)
Largest critical path contributor(s):
36%: Store Operation (conv.cl:122)
36%: Store Operation (conv.cl:122)
I don't see any data dependency here. if the compiler is inferring wrongly, does any one know how to avoid this? (if I make "weight_double_buf" and "data_double_buf" normal float or remove the conditions, the II will become 1) And advice would be greatly appreciated! Lancer
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Which line is line 122 in your code? False dependencies on "global" buffers can be avoided by adding# pragma ivdep array(*buffer_name*) before the loop (Best practices guide, Section 5.2). Note that incorrect use of this pragma WILL result in incorrect output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi HRZ,
Thanks for your reply. The report means there are dependency between line "data_double_buf[wr_bank_sel][cp][w] = 0.0f;" and line "data_double_buf[wr_bank_sel][cp][w] = input[h * input_dim1 * input_dim2 + (y - pad_size) * input_dim1 + x - pad_size];" which belongs to two different conditional branches. Is it global memory dependency or local memory dependency?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since it is a "store" dependency, it is probably the local memory one (data_double_buf). You can try writing the output to a temporary register, and then writing back the value of that register to the local buffer "outside" of the if/else block to see if it removes the dependency.
By the way, why do you need the unrolled for loop here? The statement inside of the loop does not depend on the loop variable. I think you have a typo here.#pragma unroll
for(uint v = 0; v < CVEC; v++) {
data_double_buf = 0.0f;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi HRZ,
Thanks for your reply! Yes I had a typo there (that buffer is a data structure, it should be "data_double_buf[wr_bank_sel][cp][w].vector[v] = 0.0f;" Thanks for pointing out. Is there any pragma that can remove false local memory dependency like# pragma ivdep? (Not for this problem)- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have never seen the compiler falsely detecting a dependency on local memory accesses and I highly doubt that is even possible. You can always try using "#pragma ivdep" also for local memory dependencies, but I don't think it will have any effect.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page