- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
According to the Altera aocl optimization guide, forwarding branch (if-then-else) will not impact the performance negatively. I am trying to see how it will affect the actual performance by using vectorAdd kernel. The two version codes are like version 1 (no branch): __kernel void __attribute((reqd_work_group_size(256,1,1))) __attribute((num_simd_work_items(4))) vectorAdd(__global const uint *x, __global const uint *y, __global uint *restrict z) { // get index of the work item int index = get_global_id(0); // add the vector elements z[index] = x[index] + y[index]; } version 2 (conditional branch): __kernel void __attribute((reqd_work_group_size(256,1,1))) vectorAdd(__global const uint *x, __global const uint *y, __global uint *restrict z) { // get index of the work item int index = get_local_id(0); int block_id = get_group_id(0); // add the vector elements if(index < 64) { z[4*index+256*block_id] = x[4*index+256*block_id] + y[4*index+256*block_id]; z[4*index+1+256*block_id] = x[4*index+1+256*block_id] + y[4*index+1+256*block_id]; z[4*index+2+256*block_id] = x[4*index+2+256*block_id] + y[4*index+2+256*block_id]; z[4*index+3+256*block_id] = x[4*index+3+256*block_id] + y[4*index+3+256*block_id]; } } I manually disable part of the work-groups by adding a condition. If an forwarding-branch do not hurt the performance then these 2 versions should lead to similar result, right? But experiments show version 1 is much better than version 2 (5X~ faster). Then could I know if I misunderstand something about what the optimization guide saying about "forwarding branch will not impact the performance negatively"? Or there are some other tricks in the codes leading to different results? Thanks.Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What happens if you also specify the num_simd_work_items for version 2?
--- Quote Start --- __attribute((num_simd_work_items(4))). --- Quote End ---- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also have you changed the NDRange in your host code?
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page