- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After I compiled my kernel,I got the report as following:
Block2: II bottleneck due to data dependency on variable(s): value (kernels.cl:42) Largest critical path contributor(s): 64%: Floating Point Multiply Operation (kernels.cl:83) 36%: Fadd Operation (kernels.cl:83)Block2: code is here:# define NUM 128# define M 512 for(int i=0;i<NUM;i++){ double A[NUM]; double B[NUM]; double Value[M+1]; for(int m=0;m<M;m++){ for (int n=0; n < M - m; n++) { Value[n] =A*value[n] + b*Value[n+1]; } } } There are nested loops and data dependency in the code,How to optimize it?Can anyone give me some advise?Thanks so muchLink Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your code does not make much sense to me, it is just overwriting Value[n] and your computation does not depend on "m". Are you sure you are not forgetting a "+" before "="? Furthermore, your inner loop is not pipelineable due to variable exit condition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"m" is only used in the loop exit condition, and I want to get the final "Value[0]"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, either way, your loop on "m" is unpipelineable the way it is. You should either merge the loops on "m" and "n" into one loop, or convert your kernel to NDRange with both "m" and "n" absorbed into the thread dimensions, and let the runtime scheduler handle the job of minimizing the pipeline stalls/bubbles.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page