Showing results for

- Intel Community
- FPGAs and Programmable Solutions
- Intel® Quartus® Prime Software
- How to solve data dependency problem?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

**value** (kernels.cl:42) Largest critical path contributor(s): 64%: Floating Point Multiply Operation (kernels.cl:83) 36%: Fadd Operation (kernels.cl:83)Block2: code is here:# define NUM 128# define M 512 for(int i=0;i<NUM;i++){ double A[NUM]; double B[NUM]; double Value[M+1]; for(int m=0;m<M;m++){ for (int n=0; n < M - m; n++) { Value[n] =A**value[n] + b**Value[n+1]; } } } There are nested loops and data dependency in the code,How to optimize it?Can anyone give me some advise?Thanks so much

Altera_Forum

Valued Contributor III

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-06-2017
01:00 AM

1,231 Views

How to solve data dependency problem?

After I compiled my kernel,I got the report as following:

Block2: II bottleneck due to data dependency on variable(s):
3 Replies

Highlighted
##

Your code does not make much sense to me, it is just overwriting Value[n] and your computation does not depend on "m". Are you sure you are not forgetting a "+" before "="? Furthermore, your inner loop is not pipelineable due to variable exit condition.

Altera_Forum

Valued Contributor III

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-06-2017
04:45 AM

8 Views

Highlighted
##

Altera_Forum

Valued Contributor III

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-06-2017
09:05 AM

8 Views

"m" is only used in the loop exit condition, and I want to get the final "Value[0]"

Highlighted
##

Well, either way, your loop on "m" is unpipelineable the way it is. You should either merge the loops on "m" and "n" into one loop, or convert your kernel to NDRange with both "m" and "n" absorbed into the thread dimensions, and let the runtime scheduler handle the job of minimizing the pipeline stalls/bubbles.

Altera_Forum

Valued Contributor III

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-06-2017
11:46 AM

8 Views

For more complete information about compiler optimizations, see our Optimization Notice.