Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
1,572 Views

Kernel compilation failure for OpenCL FPGA.

I have this OpenCL code that I'm compiling using OpenCL compiler 16.0-pro. Here is the code:

 

// // (c) December 19, 2018 Saman Biookaghazadeh @ Arizona State University //   #ifdef INT_PRECISION #define DTYPE int #elif SINGLE_PRECISION #define DTYPE float #elif DOUBLE_PRECISION #pragma OPENCL EXTENSION cl_khr_fp64: enable #define DTYPE double #endif   #include "funcs.h"   channel DTYPE c0;   __kernel void mm_k1 (__global const DTYPE* restrict A, __global const DTYPE* restrict B, __global const DTYPE* restrict C, __global DTYPE* restrict D, const DTYPE alpha, const DTYPE beta #ifdef FPGA_SINGLE ,const int lll) #else ) #endif {   #ifdef GPU   #endif   #ifdef FPGA_SINGLE   for (int i = 0; i < lll; i++) { for (int j = 0; j < lll; j++) { DTYPE temp = 0.0; for (int z = 0; z < lll/BLOCK_SIZE; z++) { DTYPE A_local[BLOCK_SIZE]; DTYPE B_local[BLOCK_SIZE]; DTYPE local_temp = 0.0;   // Coalescing memory read from the memory section "A" #pragma unroll for (int k = 0; k < BLOCK_SIZE; k++) { A_local[k] = A[i*lll+z*BLOCK_SIZE+k]; }   // Coalescing memory read from the memory section "B" #pragma unroll for (int k = 0; k < BLOCK_SIZE; k++) { B_local[k] = B[i*lll+z*BLOCK_SIZE+k]; }   // Accumulating the result of multiplications #pragma unroll for (int k = 0; k < BLOCK_SIZE; k++) { local_temp += A[k] * B[k] * alpha; }   // final accumulation temp += local_temp; } write_channel_altera (c0, temp); } }   #endif   }   __kernel void mm_k2 (__global const DTYPE* restrict A, __global const DTYPE* restrict B, __global const DTYPE* restrict C, __global DTYPE* restrict D, const DTYPE alpha, const DTYPE beta #ifdef FPGA_SINGLE ,const int lll) #else ) #endif {   #ifdef GPU   #endif   #ifdef FPGA_SINGLE for (int i = 0; i < lll; i++) { for (int j = 0; j < lll; j++) { DTYPE temp = read_channel_altera(c0);   #pragma ivdep for (int z = 0; z < lll/BLOCK_SIZE; z++) { DTYPE C_local[BLOCK_SIZE]; DTYPE D_local[BLOCK_SIZE];   // Coalescing memory read from the memory section "A" #pragma unroll for (int k = 0; k < BLOCK_SIZE; k++) { C_local[k] = C[j*lll+z*BLOCK_SIZE+k]; }   // Initializing the memory section "D" #pragma unroll for (int k = 0; k < BLOCK_SIZE; k++) { D_local[k] = 0.0; }   // Accumulating the result of multiplications #pragma unroll for (int k = 0; k < BLOCK_SIZE; k++) { D_local[k] += temp * C_local[k]; }   // final accumulation #pragma unroll for (int k = 0; k < BLOCK_SIZE; k++) { D[j*lll+z*BLOCK_SIZE+k] += D_local[k]; }   }   } } #endif   }

I compile the code using the script:

 

aoc -W -v -DSINGLE_PRECISION=true -DFPGA_SINGLE=true -DBLOCK_SIZE=32 --report --board p385a_sch_ax115 2mm.cl -o bin/mm_BS32.aocx

For some reason, the code fails for the compilation. There is no specific error for the compilation failure. I have added the quartus_sh_compile.log log:

 

Info: ******************************************************************* Info: Running Quartus Prime Shell Info: Version 16.0.2 Build 222 07/20/2016 SJ Pro Edition Info: Copyright (C) 1991-2016 Altera Corporation. All rights reserved. Info: Your use of Altera Corporation's design tools, logic functions Info: and other software and tools, and its AMPP partner logic Info: functions, and any output files from any of the foregoing Info: (including device programming or simulation files), and any Info: associated documentation or information are expressly subject Info: to the terms and conditions of the Altera Program License Info: Subscription Agreement, the Altera Quartus Prime License Agreement, Info: the Altera MegaCore Function License Agreement, or other Info: applicable license agreement, including, without limitation, Info: that your use is for the sole purpose of programming logic Info: devices manufactured by Altera and sold by Altera or its Info: authorized distributors. Please refer to the applicable Info: agreement for further details. Info: Processing started: Wed Dec 19 13:28:43 2018 Info: Command: quartus_sh -t scripts/pre_flow_pr.tcl compile top top Info: Quartus(args): compile top top Info: Using INI file /mnt/saman/shoc-fpga/src/opencl/PolyBench/bin/mm_BS32/quartus.ini Info: Running pre-flow script Info: Project name: top Info: Revision name: top Info: Checking for OpenCL SDK installation, environment should have ALTERAOCLSDKROOT defined Info: ALTERAOCLSDKROOT=/home/users/saman/16.0-pro/hld Info: Compiling import revision -> nothing to be done here! Warning (125092): Tcl Script File board/board.qip not found Info (125063): set_global_assignment -name QIP_FILE board/board.qip Warning (125092): Tcl Script File kernel_system/kernel_system.qip not found Info (125063): set_global_assignment -name QIP_FILE kernel_system/kernel_system.qip Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_pll/xcvr_pll.qip not found Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_pll/xcvr_pll.qip Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_reset_rx/xcvr_reset_rx.qip not found Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_reset_rx/xcvr_reset_rx.qip Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_reset_tx/xcvr_reset_tx.qip not found Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_reset_tx/xcvr_reset_tx.qip Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_sl2_txrx/xcvr_sl2_txrx.qip not found Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_sl2_txrx/xcvr_sl2_txrx.qip Info: Device part name is 10AX115N3F40E2SG Info: Compiling import revision -> nothing to be done here! Info: Compiling import revision -> nothing to be done here! Info: Generating kernel_system.qsys: Info: qsys-generate -syn --output-directory=kernel_system/ --family="Arria 10" --part=10AX115N3F40E2SG kernel_system.qsys Error (23031): Evaluation of Tcl script scripts/pre_flow_pr.tcl unsuccessful Error: Quartus Prime Shell was unsuccessful. 1 error, 6 warnings Error: Peak virtual memory: 1182 megabytes Error: Processing ended: Wed Dec 19 13:29:00 2018 Error: Elapsed time: 00:00:17 Error: Total CPU time (on all processors): 00:01:35 Info: ******************************************************************* Info: Running Quartus Prime Compiler Database Interface Info: Version 16.0.2 Build 222 07/20/2016 SJ Pro Edition Info: Copyright (C) 1991-2016 Altera Corporation. All rights reserved. Info: Your use of Altera Corporation's design tools, logic functions Info: and other software and tools, and its AMPP partner logic Info: functions, and any output files from any of the foregoing Info: (including device programming or simulation files), and any Info: associated documentation or information are expressly subject Info: to the terms and conditions of the Altera Program License Info: Subscription Agreement, the Altera Quartus Prime License Agreement, Info: the Altera MegaCore Function License Agreement, or other Info: applicable license agreement, including, without limitation, Info: that your use is for the sole purpose of programming logic Info: devices manufactured by Altera and sold by Altera or its Info: authorized distributors. Please refer to the applicable Info: agreement for further details. Info: Processing started: Wed Dec 19 13:28:42 2018 Info: Command: quartus_cdb -t import_compile.tcl Info: Using INI file /mnt/saman/shoc-fpga/src/opencl/PolyBench/bin/mm_BS32/quartus.ini Error (23031): Evaluation of Tcl script import_compile.tcl unsuccessful Error: Quartus Prime Compiler Database Interface was unsuccessful. 1 error, 0 warnings Error: Peak virtual memory: 1348 megabytes Error: Processing ended: Wed Dec 19 13:29:00 2018 Error: Elapsed time: 00:00:18 Error: Total CPU time (on all processors): 00:01:35

Does anyone have any idea what is the issue here? I have compiled so many kernel so far, and haven't faced such issue before. Would appreciate if anyone could give me a hint or something.

0 Kudos
3 Replies
Highlighted
25 Views

Hi,

 

What is the output on console any Error hint from there ?

 

Also how much is the available RAM for the system ?

 

Thanks,

Arslan

0 Kudos
Highlighted
Valued Contributor II
25 Views

I don't have 16.0.2 installed but I did a quick test with 16.1.2 on the same Nallatech board and it seems to work fine. I didn't wait until place and route finishes but the process indeed progressed further than it does for you. Since the error is happening after OpenCL to HDL conversion, this is likely not a problem related to your kernel. Maybe there are some files remaining from a previous compile that is interfering with the new one; have you tried fully removing the OpenCL project folder and retrying? This could also be some bug in some Quartus component that has been fixed in 16.1.0/16.1.2. I would strongly recommend using at least 16.1.2 on Arria 10 since 16.0.x performs terribly when it comes to packing math operations into the DSPs on Arria 10.

Highlighted
Beginner
25 Views

Thanks much for the reply. I actually removed the whole OpenCL project folder and still behaves the same. Maybe it's a bug, since I have never had such an issue before. The only difference is, this time I have more nested loops than before.

 

As you said, I will try and see if I can upgrade to 16.1.x version.

 

Thanks

0 Kudos