Re: optimization report not shown - data dependency

Altera_Forum · ‎11-26-2017

Hi all!

I'm using the aoc version:

Intel(R) FPGA SDK for OpenCL(TM), 64-Bit Offline Compiler

Version 17.0.2 Build 602

And.. doesn't generate the optimisation report on the log file :confused:

kernel void serially_execute (global int * restrict A,                               global int * restrict B, 
                              global int * restrict result, 
                              unsigned N) {
  int sum = 0;
  for (unsigned i = 0; i < N; i++) {
    int res;
    for (int j = 0; j < N; j++) {
      sum += A;
    }
    sum += B;
  }
  *result = sum;
}

Another question, why is data dependency on sum ? Because the outer loop is reading while a new value is computed in the inner loop?

Also this:

for(int i=0;i < r;i++) {		
	for(int j=i+1;j<r;j++) {	
		fcont=0;
		lcontsum = 0;
			
		for (int z=0; z < (c>>2); z++) {    
				
		      # pragma unroll
			for(int w=0; w < 4;w++) {
				aux = buff;
				lcont = aux + j;
				lcontsum += lcont;		
			}
		front +=lcontsum;
		}
                      a = fcont;
		}	
	}
}

Any dependency? the report doesn't say nothing but pipelined on the inner loop :oops:

Altera_Forum · ‎11-26-2017

The optimization report has now been moved from the text log to the HTML report. Check the HTML report for dependency info.

Regarding your second question, since floating-point operations are not single-cycle on the FPGA, any type of floating-point reduction will result in a data dependency on the reduction variable. In your case, there would be such dependency on the "front" variable. If the loop on "w" had not been fully unrolled, there would also be another dependency on the "lcontsum" variable. On Arria 10, assuming that your values are single-precision and you are using addition/subtraction (but not multiplication), you can use single-cycle accumulation to avoid this. Check "Intel FPGA SDK for OpenCL Programming Guide, Section 5.12" for instructions. In any other case, you can infer a shift register to avoid the dependency. Check "Intel FPGA SDK for OpenCL Best Practices Guide, Section 5.1.5" for instructions and examples.

Altera_Forum · ‎11-26-2017

--- Quote Start ---

The optimization report has now been moved from the text log to the HTML report. Check the HTML report for dependency info.

Regarding your second question, since floating-point operations are not single-cycle on the FPGA, any type of floating-point reduction will result in a data dependency on the reduction variable. In your case, there would be such dependency on the "front" variable. If the loop on "w" had not been fully unrolled, there would also be another dependency on the "lcontsum" variable. On Arria 10, assuming that your values are single-precision and you are using addition/subtraction (but not multiplication), you can use single-cycle accumulation to avoid this. Check "Intel FPGA SDK for OpenCL Programming Guide, Section 5.12" for instructions. In any other case, you can infer a shift register to avoid the dependency. Check "Intel FPGA SDK for OpenCL Best Practices Guide, Section 5.1.5" for instructions and examples.

--- Quote End ---

Thanks mate! :)

The dependency on front is because is reading the value from lcontsum that is been computed on the inner loop right?

Altera_Forum · ‎11-27-2017

--- Quote Start ---

Thanks mate! :)

The dependency on front is because is reading the value from lcontsum that is been computed on the inner loop right?

--- Quote End ---

No, it is because "front += lcontsum;" is equivalent to "front = front + lcontsum;" and the "front" on the right-hand side is the value from the previous iteration, but the floating-point addition cannot be performed in one clock and hence, the new value of front needs multiple clocks to be calculated. This creates a read-after-write dependency on "front" and increases the initiation interval of the loop on "z" to a value that is equal to the latency of floating-point addition on the FPGA.