Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
1,100 Views

OpenCL on FPGA Not able to generate hardware for matrix_mult.cl

Hello everyone, 

 

I am trying to compile a very basic matrx_mult kernel (also same error occurs for hello_world) for a nallatech 385a board. However, always I am getting the following error: 

 

aoc -v -board=p385a_sch_ax115 matrix_mult.cl  

 

 

aoc: Environment checks are completed successfully. aoc: If necessary for the compile, your BAK files will be cached here: /var/tmp/aocl/ You are now compiling the full flow!! aoc: Selected target board p385a_sch_ax115 aoc: Running OpenCL parser.... aoc: OpenCL parser completed successfully. aoc: Optimizing and doing static analysis of code... aoc: Linking with IP library ... Checking if memory usage is larger than 100% Compiler Warning: Vectorized kernel contains loads/stores that cannot be vectorized. This might reduce performance. aoc: First stage compilation completed successfully. Compiling for FPGA. This process may take a long time, please be patient. Error (23031): Evaluation of Tcl script import_compile.tcl unsuccessful Error: Quartus Prime Compiler Database Interface was unsuccessful. 1 error, 0 warnings Error: Compiler Error, not able to generate hardware  

 

 

This is the output of the quartus_sh_compile.log

 

nternal Error: Sub-system: DCALC, File: /quartus/ddb/dcalc/dcalc_bcm_modules_cache.cpp, Line: 110 Could not load pdb file - /home/opt/intelFPGA_pro/17.1/quartus/common/devinfo/20nm/ddb_nightfury_cc_dcm_h-ss-1p25-100-hs-n_model  

 

aoc verion: 

 

aoc -version Intel(R) FPGA SDK for OpenCL(TM), 64-Bit Offline Compiler Version 17.1.0 Build 240 Copyright (C) 2017 Intel Corporation  

 

Quartus Prime Pro version: 

Quartus Prime Analysis & Synthesis Version 17.1.0 Build 240 10/25/2017 SJ Pro Edition Copyright (C) 2017 Intel Corporation. All rights reserved. Quartus Prime Compiler Database Interface Version 17.1.0 Build 240 10/25/2017 SJ Pro Edition Copyright (C) 2017 Intel Corporation. All rights reserved.  

 

aocl version: 

aocl 17.1.0.240 (Intel(R) FPGA SDK for OpenCL(TM), Version 17.1.0 Build 240, Copyright (C) 2017 Intel Corporation)  

 

Also, I got a licence for the nallatech BSP kai the Quartus Prime Pro, both loaded on my system. I used the LM_LICENSE_FILE variable and trough the Quartus Prime pro to load the .dat licence file. 

 

If you have any suggestions regarding this issue, it would be much appreciated. 

 

Thank you in advance.
0 Kudos
21 Replies
Altera_Forum
Honored Contributor I
164 Views

Your log looks like a crash in the fitter. Are you using the correct version of Nallatech's BSP (i.e. 17.1.0)? Have you successfully compiled any other kernels? Do you have enough memory on your machine? Compiling kernels for Arria 10 could require up to 48 GB of memory.

Altera_Forum
Honored Contributor I
164 Views

Hello, 

 

I am using the latest BSP version from nallatech sourced from the nalla_aocl_bsp_q17_R001.005.0004.iso. 

No, this is my first attempt for compiling and downloading a kernel to the board. 

Also, the machine that I am currently compiling has 64GB of RAM, an Intel Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz and 1TB SSD drive.
Altera_Forum
Honored Contributor I
164 Views

That is the correct BSP version, so it should work. 64 GB memory should also be enough for one compilation. Try another kernel, maybe Altera's hello_world or vector_add example. If both failed with the same error, I recommend reinstalling Quartus and the BSP because something might have gotten corrupted during the installation.

Altera_Forum
Honored Contributor I
164 Views

I am getting the same error for any kernel that I am trying to compile. 

 

Also, I tried to use the pre-compiled .aosx files from the examples: 

However, I got the following output and everytime at the last stage the whole host machine reboots. 

 

 

./bin/host  

Initializing OpenCL 

Platform: Intel(R) FPGA SDK for OpenCL(TM) 

Using 1 device(s) 

p385a_sch_ax115 : nalla_pcie (aclnalla_pcie0) 

Using AOCX: vector_add.aocx 

Reprogramming device [0] with handle 1 

MMD INFO : [aclnalla_pcie0] Reprogramming device through Flash with RBF file...
Altera_Forum
Honored Contributor I
164 Views

The compilation crash and the machine crash during FPGA reprogramming are two different things. Since you encounter the compilation crash also for other kernels, I recommend doing a clean install of Quartus and the BSP and see what happens. Also make sure you are using a supported operating system. If the problem still persisted, I recommend contacting Nallatech's support. 

 

Machine crashes during FPGA reconfiguration on Arria 10 are not rare; I also experienced it myself multiple times, which eventually forced me to switch to JTAG-based programming. Try "export ACL_PCIE_USE_JTAG_PROGRAMMING=1" before running the kernel and see if that allows you to bypass the crash.
Altera_Forum
Honored Contributor I
164 Views

Hello,  

 

Thank you for the reply. 

 

I managed to bypass the machine crashing by using the JTAG instead, so I am able to flash through the JTAG successfully. 

 

However, the compilation crash still persists even after a clean installation. Is there any possibility to be a licensing issue even without getting a licence related error ?
Altera_Forum
Honored Contributor I
164 Views

 

--- Quote Start ---  

However, the compilation crash still persists even after a clean installation. Is there any possibility to be a licensing issue even without getting a licence related error ? 

--- Quote End ---  

 

 

I don't think so. Without a valid license, you will not even get past the initial OpenCL to HDL conversion step, let alone fitting and routing.
Altera_Forum
Honored Contributor I
164 Views

What operating system are you using?

Altera_Forum
Honored Contributor I
164 Views

Centos 7.4

Altera_Forum
Honored Contributor I
164 Views

Is it possible you're running out of hard drive space during the build? How much space is available on the disk where you're building?

Altera_Forum
Honored Contributor I
164 Views

I managed to get a compilation pretty far. It took about an hour and 15 minutes to fail this time for the matrix_mult.cl kernel. 

 

The last error message was: 

 

Info: Command: quartus_cdb -t import_compile.tcl Info: Using INI file /home/admin/fpga_experiments/nallatech/examples_p385a_sch_ax115/matrix_mult/matrix_mult/bin/matrix_mult/quartus.ini Info: Checking for OpenCL SDK installation, environment should have INTELFPGAOCLSDKROOT defined Info: INTELFPGAOCLSDKROOT=/home/admin/intelFPGA_pro/17.1/hld Info: Successfully completed BAK flow Info: To reduce compile time on future compiles, you can generate a BAK cache by adding the arguments '--bsp-flow regenerate_cache' to aoc to skip BAK Info: Retry strategy set to "retry-flat" Info: Initial preservation set to "final" Info (125061): Changed top-level design entity name to "top" Info (125061): Changed top-level design entity name to "kernel_system" Info (16677): Loading synthesized database Info (16734): Loading "synthesized" snapshot for partition "root_partition". Info (16678): Successfully loaded synthesized database: elapsed time is 00:00:03 Info: Performing a fit attempt Error: Quartus Fitter has failed! Breaking execution...  

 

I am attaching also the generated log. 

 

Any suggestions regarding that? 

I never seen this error message before or I either my compilation went that far. 

 

P.S the matrix_mult is directly taken from nallatech example rar for my board. I didnt modify it.
Altera_Forum
Honored Contributor I
164 Views

You are getting another random crash: 

 

 

--- Quote Start ---  

Internal Error: Sub-system: VPR20KMAIN, File: /quartus/fitter/vpr20k/dap/dap_congestion.cpp, Line: 1525 

Internal Error 

Stack Trace: 

0xbb0d4: vpr_qi_jump_to_exit + 0x6f (fitter_vpr20kmain) 

0x1b6288: vpr_exit_at_line + 0x83 (fitter_vpr20kmain) 

 

 

0x562d37: dap_evaluate_move + 0x77 (fitter_vpr20kmain) 

0x416556: l_mpp_perform_moves_worker(void*) [clone .isra.671] + 0x30f (fitter_vpr20kmain) 

0x416682: l_mpp_worker_thread(void*) + 0x4a (fitter_vpr20kmain) 

0xd09d2: l_thread_start_wrapper(void*) + 0x29 (fitter_vpr20kmain) 

0x5b4c: thr_final_wrapper + 0xc (ccl_thr) 

0x3f21f: msg_thread_wrapper(void* (*)(void*), void*) + 0x62 (ccl_msg) 

0xac5c: mem_thread_wrapper(void* (*)(void*), void*) + 0x5c (ccl_mem) 

0x8b49: err_thread_wrapper(void* (*)(void*), void*) + 0x27 (ccl_err) 

0x5b8f: thr_thread_wrapper + 0x15 (ccl_thr) 

0x5e72: thr_thread_begin + 0x46 (ccl_thr) 

0x7e25: start_thread + 0xc5 (pthread) 

0xf834d: clone + 0x6d (c) 

 

End-trace 

--- Quote End ---  

 

 

Since the crashing point seems to be changing, my guess is that some component in your system must be unstable. This could be the CPU, the memory, the disk, or even the network if you are using a network storage system. Do you have any other machines to use for a compilation test?
Altera_Forum
Honored Contributor I
164 Views

 

--- Quote Start ---  

You are getting another random crash: 

 

 

 

Since the crashing point seems to be changing, my guess is that some component in your system must be unstable. This could be the CPU, the memory, the disk, or even the network if you are using a network storage system. Do you have any other machines to use for a compilation test? 

--- Quote End ---  

 

 

 

I can try a different machine. However, the issue is that for the Altera SDK I got floating licence, while for the nallatech BSP the license is binded with the MAC address of that machine.  

Also, is the only machine around with 64GBs of RAM. 

 

I am currently using CentOS 7.4 which is a supported OS for compilation and runtime according to nallatech.  

Do you think an other OS might be more stable for the current work-flow? Any suggestions? 

 

Thanks again for the replies!
Altera_Forum
Honored Contributor I
164 Views

 

--- Quote Start ---  

I can try a different machine. However, the issue is that for the Altera SDK I got floating licence, while for the nallatech BSP the license is binded with the MAC address of that machine. 

--- Quote End ---  

 

Nallatech's BSP license is only for installation of the BSP, it does not bind the BSP to the machine (Altera does not even provide the possibility to do so). After installation, you can copy the BSP folder to any other machine and use it. Though they probably want you to think that you cannot do this. ;) 

 

 

--- Quote Start ---  

I am currently using CentOS 7.4 which is a supported OS for compilation and runtime according to nallatech.  

Do you think an other OS might be more stable for the current work-flow? Any suggestions? 

--- Quote End ---  

 

 

I have used CentOS v7.x for compilation of kernels for Nallatech's 385A board using different BSPs (16.0.2, 16.1.2, 17.1) and never encountered any OS-related problems. Apart from that, as you said, Nallatech's PCI-E driver officially supports CentOS v7.x starting from BSP v17.0, so that should also be fine.
Altera_Forum
Honored Contributor I
164 Views

Hello, 

 

Thanks again for the reply. 

 

I manage to compile and run successfully on the fpga. I compiled the matrix multiplication kernel and took 2 hours and 25 minutes for completion. 

However, the problem was very odd. It was due to the network connection  

 

SOLUTION: 

 

I disconnect the machine from the network ( I took out the Ethernet cable ) and it worked in the first go.  

Previously, I used only ssh connection to the machine (only through Tmux e.g byobu, screen etc). 

 

Is anyone experienced something similar before??? 

 

 

Also, I discover that some of the pre-complied examples given by nallatech for my current board or not compilied for the current BSP that they claim.  

As many of them failed to run with a CL_INVALID_BINARY, while the rest working out of the box.
Altera_Forum
Honored Contributor I
164 Views

 

--- Quote Start ---  

I disconnect the machine from the network ( I took out the Ethernet cable ) and it worked in the first go.  

Previously, I used only ssh connection to the machine (only through Tmux e.g byobu, screen etc). 

 

Is anyone experienced something similar before??? 

--- Quote End ---  

 

 

That sounds very strange. Honestly I wouldn't be quick to judge in this case; your issue could be caused by a recurring transit problem that just didn't happen this time, and could happen again later. The compilation machines I personally use are all remote machines and I just connect to them via SSH, so the problem is certainly not from remote connections.
Altera_Forum
Honored Contributor I
164 Views

 

--- Quote Start ---  

That sounds very strange. Honestly I wouldn't be quick to judge in this case; your issue could be caused by a recurring transit problem that just didn't happen this time, and could happen again later. The compilation machines I personally use are all remote machines and I just connect to them via SSH, so the problem is certainly not from remote connections. 

--- Quote End ---  

 

 

Just checked with two more kernels and I can verify that the ethernet causes the random crashes during compilation.  

As long as the machine is offline, compilation worked fine.
Altera_Forum
Honored Contributor I
164 Views

 

--- Quote Start ---  

That sounds very strange. Honestly I wouldn't be quick to judge in this case; your issue could be caused by a recurring transit problem that just didn't happen this time, and could happen again later. The compilation machines I personally use are all remote machines and I just connect to them via SSH, so the problem is certainly not from remote connections. 

--- Quote End ---  

 

 

Agreed; there is probably something else about your system that changed. Have you tried rebooting the machine and then building the kernel with and without the ethernet plugged in? If I were you, I would try to write down anything I've done to the machine that may or may not have made the tools work in case whatever change you made is only temporary (and will get undone upon reboot). 

 

I asked earlier but do you have a lot of free disk space on the drive you're building on? If you've been running out of room on your hard drive during builds that would explain the seemingly random crashes (b/c the tool can't create files it needs to).
Altera_Forum
Honored Contributor I
164 Views

 

--- Quote Start ---  

Just checked with two more kernels and I can verify that the ethernet causes the random crashes during compilation.  

As long as the machine is offline, compilation worked fine. 

--- Quote End ---  

 

 

Fair enough. How do you checkout your floating license without a connection?
Altera_Forum
Honored Contributor I
48 Views

 

--- Quote Start ---  

Agreed; there is probably something else about your system that changed. Have you tried rebooting the machine and then building the kernel with and without the ethernet plugged in? If I were you, I would try to write down anything I've done to the machine that may or may not have made the tools work in case whatever change you made is only temporary (and will get undone upon reboot). 

 

 

 

I asked earlier but do you have a lot of free disk space on the drive you're building on? If you've been running out of room on your hard drive during builds that would explain the seemingly random crashes (b/c the tool can't create files it needs to). 

--- Quote End ---  

 

 

I rebooted the system and still works only without the ethernet cable connectesd 

 

Currently on my machine I got an SSD drive of 1TB with 600GB free space. I also checked during the successful compilation and my resource utilization is ok for that machine (e.g. about 20GB of RAM out of 64 and CPU usage on all cores)
Reply