We have an Arria 10 GX FPGA Dev Kit that we've been using for more than a year with the Intel FPGA SDK for OpenCL version 18.0 on Ubuntu 16.04. Recently (last week), the FPGA stopped working with the OpenCL runtime, running aocl diagnose gives me the followign error message:
Found no active device installed on the host machine. Please make sure to: 1. Set the environment variable AOCL_BOARD_PACKAGE_ROOT to the correct board package. 2. Install the driver from the selected board package. 3. Properly install the device in the host machine. 4. Configure the device with a supported OpenCL design. 5. Reboot the machine if the PCI Express link failed. DIAGNOSTIC_FAILED
I tried to move from 18.0 to 19.1, however the issue still remains. Any thoughts on what might be the problem? Could a kernel update to the OS have been the cause of this issue? (I am currently checking with our IT department to see if they might have issued an OS update to the system).
May I know the information as below:
1. Did aocl install pass?
2. Did jtagconfig pass?
3. Did aocl program pass?
4. Does the PC reboot when program the FPGA?
5. What is the LCD display on the devices?
Here are my answers to your question:
1) I think so, here is the output:
root@hpvmfpga:[/home/aejjeh]: aocl install Do you want to setup the FCD at directory /opt/Intel/OpenCL/Boards [y/n] y aocl install: Adding the board package /opt/intelFPGA_pro/19.1/hld/board/a10_ref to the list of installed packages aocl install: Setting up the FPGA Client Driver (FCD) to the system. Install the FCD file to /opt/Intel/OpenCL/Boards Installing the board package driver to the system. aocl install: Running install from /opt/intelFPGA_pro/19.1/hld/board/a10_ref/linux64/libexec Looking for kernel source files in /lib/modules/4.15.0-66-generic/build Using kernel source files from /lib/modules/4.15.0-66-generic/build Building driver for BSP with name a10_ref make: Entering directory '/usr/src/linux-headers-4.15.0-66-generic' CC [M] /tmp/opencl_driver_6kHrDJ/aclpci_queue.o /tmp/opencl_driver_6kHrDJ/aclpci_queue.c: In function ‘queue_push’: /tmp/opencl_driver_6kHrDJ/aclpci_queue.c:133:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] void* dest = queue_addr(q, loc); ^ CC [M] /tmp/opencl_driver_6kHrDJ/aclpci.o CC [M] /tmp/opencl_driver_6kHrDJ/aclpci_fileio.o CC [M] /tmp/opencl_driver_6kHrDJ/aclpci_dma.o CC [M] /tmp/opencl_driver_6kHrDJ/aclpci_pr.o CC [M] /tmp/opencl_driver_6kHrDJ/aclpci_cmd.o /tmp/opencl_driver_6kHrDJ/aclpci_cmd.c: In function ‘aclpci_exec_cmd’: /tmp/opencl_driver_6kHrDJ/aclpci_cmd.c:176:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] size_t bytes_copy = strnlen(ACL_BOARD_PKG_NAME, BUF_SIZE) + strnlen(ACL_DRIVER_VERSION, BUF_SIZE) + 2; // 1 for '.', 1 for '\0' ^ LD [M] /tmp/opencl_driver_6kHrDJ/aclpci_a10_ref_drv.o Building modules, stage 2. MODPOST 1 modules CC /tmp/opencl_driver_6kHrDJ/aclpci_a10_ref_drv.mod.o LD [M] /tmp/opencl_driver_6kHrDJ/aclpci_a10_ref_drv.ko make: Leaving directory '/usr/src/linux-headers-4.15.0-66-generic'
2) Yes, jtagconfig passes
root@hpvmfpga:[/home/aejjeh]: jtagconfig 1) USB-BlasterII [2-1.7] 02E660DD 10AX115H1(.|E2|ES)/10AX115H2/.. 020A40DD 5M(1270ZF324|2210Z)/EPM2210
3) I cannot run aocl program because aocl does not detect the device to start with
4) Not sure what you mean here, I reboot the machine manually when I am trying to initialize the board. I have a script that I use to initialize the board based on the AN 807 Intel document: https://www.intel.com/content/www/us/en/programmable/documentation/tgy1490191698959.html#wmh14902129...
Basically, I set the jtag speed to 6M, then I run the following two commands:
quartus_pgm -c 1 -m JTAG -o "p;max5_150.pof@2" quartus_pgm -c 1 -m JTAG -o "p;top.sof"
After that I do a soft reboot. When the reboot is done, I used to run aocl install and then the board would work.
5) The board is inside the PC chassis. I cannot see the LCD display while the board is connected to PCIe.
Can you check the thing as below:
1. Connection of USB cable
2. Swapping the USB cable
3. Checking the switches according to https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an807.pdf
4. Running with the command aocl compile-config and aocl link-config.
The usb cable is working, jtag-config detects the board and works. I can program the board with no problem. Also, as I mentioned, the board was working previously, and I have previously configured all switches and jumpers according to an807. I confirmed that there was a kernel update prior to the board stopping to work.
I am understand with the problem you are mentioned.
May I know do you have other Arria10 GX FPGA Dev board?
If yes, may I know other Arria10GX FPGA board have these problem?
As mentioned earlier, you had update the linux kernel. It might be not compatible with the FPGA driver. Do you have install the PCIE driver after you update the linux kernel?
Also, you may need to do the steps:
- Install the driver from the selected board package.
- Properly install the device in the host machine.
To answer your first question, no we do not have another Dev Board to try out.
As for the linux kernel update, yes I performed "aocl install" AFTER the linux kernel got updated. I have posted the output of "aocl install" in one of my previous messages. The device is definitely properly installed, I have not removed it.
From the initial assessment, it might be due to the driver compatible issue in new kernel/OS.
To confirm the driver compatible issues in new kernel/OS, can you try to compile an opencl example like hello_world (provided) in emulator? This is done to confirm there is no issues in tools.
Also, I would like to know the kernel/OS is updated from Ubuntu which version and now using which Ubuntu version?
I would like to confirm the information as below, can you try to answer:
- Have you compile the OpenCL example in emulator successfully?
- Ubuntu Version before update
- Ubuntu version after update is 16.04?
From the description, it might be high possibility the problem coming from the kernel version.
Can you revert back the kernel version to the previous version?
And can you let us know the older kernel version number that you used?