Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16602 Discussions

optimization report not shown - data dependency

Altera_Forum
Honored Contributor II
1,110 Views

Hi all! 

 

I'm using the aoc version: 

Intel(R) FPGA SDK for OpenCL(TM), 64-Bit Offline Compiler 

Version 17.0.2 Build 602 

Copyright (C) 2017 Intel Corporation 

 

And.. doesn't generate the optimisation report on the log file :confused: 

 

kernel void serially_execute (global int * restrict A, global int * restrict B, global int * restrict result, unsigned N) { int sum = 0; for (unsigned i = 0; i < N; i++) { int res; for (int j = 0; j < N; j++) { sum += A; } sum += B; } *result = sum; }  

 

Another question, why is data dependency on sum ? Because the outer loop is reading while a new value is computed in the inner loop? 

 

Also this: 

 

for(int i=0;i < r;i++) { for(int j=i+1;j<r;j++) { fcont=0; lcontsum = 0; for (int z=0; z < (c>>2); z++) { # pragma unroll for(int w=0; w < 4;w++) { aux = buff; lcont = aux + j; lcontsum += lcont; } front +=lcontsum; } a = fcont; } } }  

 

Any dependency? the report doesn't say nothing but pipelined on the inner loop :oops:
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
308 Views

The optimization report has now been moved from the text log to the HTML report. Check the HTML report for dependency info. 

 

Regarding your second question, since floating-point operations are not single-cycle on the FPGA, any type of floating-point reduction will result in a data dependency on the reduction variable. In your case, there would be such dependency on the "front" variable. If the loop on "w" had not been fully unrolled, there would also be another dependency on the "lcontsum" variable. On Arria 10, assuming that your values are single-precision and you are using addition/subtraction (but not multiplication), you can use single-cycle accumulation to avoid this. Check "Intel FPGA SDK for OpenCL Programming Guide, Section 5.12" for instructions. In any other case, you can infer a shift register to avoid the dependency. Check "Intel FPGA SDK for OpenCL Best Practices Guide, Section 5.1.5" for instructions and examples.
0 Kudos
Altera_Forum
Honored Contributor II
308 Views

 

--- Quote Start ---  

The optimization report has now been moved from the text log to the HTML report. Check the HTML report for dependency info. 

 

Regarding your second question, since floating-point operations are not single-cycle on the FPGA, any type of floating-point reduction will result in a data dependency on the reduction variable. In your case, there would be such dependency on the "front" variable. If the loop on "w" had not been fully unrolled, there would also be another dependency on the "lcontsum" variable. On Arria 10, assuming that your values are single-precision and you are using addition/subtraction (but not multiplication), you can use single-cycle accumulation to avoid this. Check "Intel FPGA SDK for OpenCL Programming Guide, Section 5.12" for instructions. In any other case, you can infer a shift register to avoid the dependency. Check "Intel FPGA SDK for OpenCL Best Practices Guide, Section 5.1.5" for instructions and examples. 

--- Quote End ---  

 

 

Thanks mate! :) 

 

The dependency on front is because is reading the value from lcontsum that is been computed on the inner loop right?
0 Kudos
Altera_Forum
Honored Contributor II
308 Views

 

--- Quote Start ---  

Thanks mate! :) 

 

The dependency on front is because is reading the value from lcontsum that is been computed on the inner loop right? 

--- Quote End ---  

 

 

No, it is because "front += lcontsum;" is equivalent to "front = front + lcontsum;" and the "front" on the right-hand side is the value from the previous iteration, but the floating-point addition cannot be performed in one clock and hence, the new value of front needs multiple clocks to be calculated. This creates a read-after-write dependency on "front" and increases the initiation interval of the loop on "z" to a value that is equal to the latency of floating-point addition on the FPGA.
0 Kudos
Reply