- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have this OpenCL code that I'm compiling using OpenCL compiler 16.0-pro. Here is the code:
//
// (c) December 19, 2018 Saman Biookaghazadeh @ Arizona State University
//
#ifdef INT_PRECISION
#define DTYPE int
#elif SINGLE_PRECISION
#define DTYPE float
#elif DOUBLE_PRECISION
#pragma OPENCL EXTENSION cl_khr_fp64: enable
#define DTYPE double
#endif
#include "funcs.h"
channel DTYPE c0;
__kernel void mm_k1 (__global const DTYPE* restrict A,
__global const DTYPE* restrict B,
__global const DTYPE* restrict C,
__global DTYPE* restrict D,
const DTYPE alpha,
const DTYPE beta
#ifdef FPGA_SINGLE
,const int lll)
#else
)
#endif
{
#ifdef GPU
#endif
#ifdef FPGA_SINGLE
for (int i = 0; i < lll; i++) {
for (int j = 0; j < lll; j++) {
DTYPE temp = 0.0;
for (int z = 0; z < lll/BLOCK_SIZE; z++) {
DTYPE A_local[BLOCK_SIZE];
DTYPE B_local[BLOCK_SIZE];
DTYPE local_temp = 0.0;
// Coalescing memory read from the memory section "A"
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
A_local[k] = A[i*lll+z*BLOCK_SIZE+k];
}
// Coalescing memory read from the memory section "B"
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
B_local[k] = B[i*lll+z*BLOCK_SIZE+k];
}
// Accumulating the result of multiplications
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
local_temp += A[k] * B[k] * alpha;
}
// final accumulation
temp += local_temp;
}
write_channel_altera (c0, temp);
}
}
#endif
}
__kernel void mm_k2 (__global const DTYPE* restrict A,
__global const DTYPE* restrict B,
__global const DTYPE* restrict C,
__global DTYPE* restrict D,
const DTYPE alpha,
const DTYPE beta
#ifdef FPGA_SINGLE
,const int lll)
#else
)
#endif
{
#ifdef GPU
#endif
#ifdef FPGA_SINGLE
for (int i = 0; i < lll; i++) {
for (int j = 0; j < lll; j++) {
DTYPE temp = read_channel_altera(c0);
#pragma ivdep
for (int z = 0; z < lll/BLOCK_SIZE; z++) {
DTYPE C_local[BLOCK_SIZE];
DTYPE D_local[BLOCK_SIZE];
// Coalescing memory read from the memory section "A"
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
C_local[k] = C[j*lll+z*BLOCK_SIZE+k];
}
// Initializing the memory section "D"
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
D_local[k] = 0.0;
}
// Accumulating the result of multiplications
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
D_local[k] += temp * C_local[k];
}
// final accumulation
#pragma unroll
for (int k = 0; k < BLOCK_SIZE; k++) {
D[j*lll+z*BLOCK_SIZE+k] += D_local[k];
}
}
}
}
#endif
}
I compile the code using the script:
aoc -W -v -DSINGLE_PRECISION=true -DFPGA_SINGLE=true -DBLOCK_SIZE=32 --report --board p385a_sch_ax115 2mm.cl -o bin/mm_BS32.aocx
For some reason, the code fails for the compilation. There is no specific error for the compilation failure. I have added the quartus_sh_compile.log log:
Info: *******************************************************************
Info: Running Quartus Prime Shell
Info: Version 16.0.2 Build 222 07/20/2016 SJ Pro Edition
Info: Copyright (C) 1991-2016 Altera Corporation. All rights reserved.
Info: Your use of Altera Corporation's design tools, logic functions
Info: and other software and tools, and its AMPP partner logic
Info: functions, and any output files from any of the foregoing
Info: (including device programming or simulation files), and any
Info: associated documentation or information are expressly subject
Info: to the terms and conditions of the Altera Program License
Info: Subscription Agreement, the Altera Quartus Prime License Agreement,
Info: the Altera MegaCore Function License Agreement, or other
Info: applicable license agreement, including, without limitation,
Info: that your use is for the sole purpose of programming logic
Info: devices manufactured by Altera and sold by Altera or its
Info: authorized distributors. Please refer to the applicable
Info: agreement for further details.
Info: Processing started: Wed Dec 19 13:28:43 2018
Info: Command: quartus_sh -t scripts/pre_flow_pr.tcl compile top top
Info: Quartus(args): compile top top
Info: Using INI file /mnt/saman/shoc-fpga/src/opencl/PolyBench/bin/mm_BS32/quartus.ini
Info: Running pre-flow script
Info: Project name: top
Info: Revision name: top
Info: Checking for OpenCL SDK installation, environment should have ALTERAOCLSDKROOT defined
Info: ALTERAOCLSDKROOT=/home/users/saman/16.0-pro/hld
Info: Compiling import revision -> nothing to be done here!
Warning (125092): Tcl Script File board/board.qip not found
Info (125063): set_global_assignment -name QIP_FILE board/board.qip
Warning (125092): Tcl Script File kernel_system/kernel_system.qip not found
Info (125063): set_global_assignment -name QIP_FILE kernel_system/kernel_system.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_pll/xcvr_pll.qip not found
Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_pll/xcvr_pll.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_reset_rx/xcvr_reset_rx.qip not found
Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_reset_rx/xcvr_reset_rx.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_reset_tx/xcvr_reset_tx.qip not found
Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_reset_tx/xcvr_reset_tx.qip
Warning (125092): Tcl Script File ip/a10_sl2_txrx/xcvr_sl2_txrx/xcvr_sl2_txrx.qip not found
Info (125063): set_global_assignment -name QIP_FILE ip/a10_sl2_txrx/xcvr_sl2_txrx/xcvr_sl2_txrx.qip
Info: Device part name is 10AX115N3F40E2SG
Info: Compiling import revision -> nothing to be done here!
Info: Compiling import revision -> nothing to be done here!
Info: Generating kernel_system.qsys:
Info: qsys-generate -syn --output-directory=kernel_system/ --family="Arria 10" --part=10AX115N3F40E2SG kernel_system.qsys
Error (23031): Evaluation of Tcl script scripts/pre_flow_pr.tcl unsuccessful
Error: Quartus Prime Shell was unsuccessful. 1 error, 6 warnings
Error: Peak virtual memory: 1182 megabytes
Error: Processing ended: Wed Dec 19 13:29:00 2018
Error: Elapsed time: 00:00:17
Error: Total CPU time (on all processors): 00:01:35
Info: *******************************************************************
Info: Running Quartus Prime Compiler Database Interface
Info: Version 16.0.2 Build 222 07/20/2016 SJ Pro Edition
Info: Copyright (C) 1991-2016 Altera Corporation. All rights reserved.
Info: Your use of Altera Corporation's design tools, logic functions
Info: and other software and tools, and its AMPP partner logic
Info: functions, and any output files from any of the foregoing
Info: (including device programming or simulation files), and any
Info: associated documentation or information are expressly subject
Info: to the terms and conditions of the Altera Program License
Info: Subscription Agreement, the Altera Quartus Prime License Agreement,
Info: the Altera MegaCore Function License Agreement, or other
Info: applicable license agreement, including, without limitation,
Info: that your use is for the sole purpose of programming logic
Info: devices manufactured by Altera and sold by Altera or its
Info: authorized distributors. Please refer to the applicable
Info: agreement for further details.
Info: Processing started: Wed Dec 19 13:28:42 2018
Info: Command: quartus_cdb -t import_compile.tcl
Info: Using INI file /mnt/saman/shoc-fpga/src/opencl/PolyBench/bin/mm_BS32/quartus.ini
Error (23031): Evaluation of Tcl script import_compile.tcl unsuccessful
Error: Quartus Prime Compiler Database Interface was unsuccessful. 1 error, 0 warnings
Error: Peak virtual memory: 1348 megabytes
Error: Processing ended: Wed Dec 19 13:29:00 2018
Error: Elapsed time: 00:00:18
Error: Total CPU time (on all processors): 00:01:35
Does anyone have any idea what is the issue here? I have compiled so many kernel so far, and haven't faced such issue before. Would appreciate if anyone could give me a hint or something.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
What is the output on console any Error hint from there ?
Also how much is the available RAM for the system ?
Thanks,
Arslan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't have 16.0.2 installed but I did a quick test with 16.1.2 on the same Nallatech board and it seems to work fine. I didn't wait until place and route finishes but the process indeed progressed further than it does for you. Since the error is happening after OpenCL to HDL conversion, this is likely not a problem related to your kernel. Maybe there are some files remaining from a previous compile that is interfering with the new one; have you tried fully removing the OpenCL project folder and retrying? This could also be some bug in some Quartus component that has been fixed in 16.1.0/16.1.2. I would strongly recommend using at least 16.1.2 on Arria 10 since 16.0.x performs terribly when it comes to packing math operations into the DSPs on Arria 10.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks much for the reply. I actually removed the whole OpenCL project folder and still behaves the same. Maybe it's a bug, since I have never had such an issue before. The only difference is, this time I have more nested loops than before.
As you said, I will try and see if I can upgrade to 16.1.x version.
Thanks

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page