Community
cancel
Showing results for 
Search instead for 
Did you mean: 
PSath2
Beginner
364 Views

DevCloud: OpenCL kernels build after update but host runtime fails on fpga nodes with auto-discovery error

After the software update to the FPGA compile nodes in https://software.intel.com/en-us/forums/intel-oneapi-base-toolkit/topic/843060

I seem to once again be able to compile kernels, thanks!

However, the OpenCL runtime fails to access the pac_a10 device in the nodes with the "fpga" or "arria10" property.

UPDATE: It appears this may be specific to s001-n084, as I am able to run clinfo successfully on s001-n088 and s001-n086. There is no distinguishing property in pbsnodes to separate out the broken nodes, so a poor workaround may be to manually pick a free node and queue there IFF it functions properly.

 

I have tried both with my own codes and with a simple clinfo and the result is the same autodiscovery error. clinfo output below

@s001-n084:~$ clinfo Number of platforms 3 Platform Name Intel(R) FPGA Emulation Platform for OpenCL(TM) Platform Vendor Intel(R) Corporation Platform Version OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 19.2 Platform Profile EMBEDDED_PROFILE Platform Extensions cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program Platform Extensions function suffix IntelFPGA   Platform Name Intel(R) FPGA SDK for OpenCL(TM) Platform Vendor Intel(R) Corporation Platform Version OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 19.3api Platform Profile EMBEDDED_PROFILE Platform Extensions cl_khr_byte_addressable_store cles_khr_int64 cl_khr_icd Platform Extensions function suffix IntelFPGA   Platform Name Intel(R) OpenCL Platform Vendor Intel(R) Corporation Platform Version OpenCL 2.1 LINUX Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_intel_device_partition_by_names cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer Platform Host timer resolution 1ns Platform Extensions function suffix INTEL   Platform Name Intel(R) FPGA Emulation Platform for OpenCL(TM) Number of devices 1 Device Name Intel(R) FPGA Emulation Device Device Vendor Intel(R) Corporation Device Vendor ID 0x1172 Device Version OpenCL 1.0 Driver Version 2019.8.10.0 Device OpenCL C Version OpenCL C 1.0 Device Type Accelerator Device Profile EMBEDDED_PROFILE Device Available Yes Compiler Available Yes Max compute units 24 Max clock frequency 3400MHz Max work item dimensions 3 Max work item sizes 67108864x67108864x67108864 Max work group size 67108864 Preferred work group size multiple 128 Preferred / native vector sizes char 1 / 32 short 1 / 16 int 1 / 8 long 1 / 4 half 0 / 0 (n/a) float 1 / 8 double 1 / 4 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 64, Little-Endian Global memory size 202518421504 (188.6GiB) Error Correction support No Max memory allocation 50629605376 (47.15GiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size 262144 (256KiB) Global Memory cache line size 64 bytes Image support No Local memory type Global Local memory size 262144 (256KiB) Max number of constant args 480 Max constant buffer size 131072 (128KiB) Max size of kernel argument 3840 (3.75KiB) Queue properties Out-of-order execution Yes Profiling Yes Profiling timer resolution 1ns Execution capabilities Run OpenCL kernels Yes Run native kernels Yes IL version SPIR-V_1.0 Device Extensions cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program   Platform Name Intel(R) FPGA SDK for OpenCL(TM) Number of devices 1 FAILED to read auto-discovery string at byte 18446744073709551615. Full auto-discovery string value is   acl_hal_mmd.cpp:1426:assert failure: Failed to initialize kernel interfaceclinfo: acl_hal_mmd.cpp:1426: int l_try_device(unsigned int, const char*, acl_system_def_t*, acl_mmd_dispatch_t*): Assertion `0' failed. Aborted

aocl diagnose thinks the BSP is installed correctly (attached as follow-up comment due to post length)

 

And my environment is essentially default (other than a bash function override of python-->python2) (attached as follow-up comment due to post length)

0 Kudos
12 Replies
PSath2
Beginner
143 Views

aocl diagnose output

@s001-n084:~$ aocl diagnose -------------------------------------------------------------------- ICD System Diagnostics --------------------------------------------------------------------   Using the following location for ICD installation: /etc/OpenCL/vendors   Found 4 icd entry at that location: /etc/OpenCL/vendors/Altera.icd /etc/OpenCL/vendors/intel-cpu.icd /etc/OpenCL/vendors/Intel_FPGA_SSG_Emulator.icd /etc/OpenCL/vendors/intel-neo.icd   the following OpenCL libraries are referenced in the icd files: libalteracl.so libintelocl.so libintelocl_emu.so libigdrcl.so   checking LD_LIBRARY_PATH for registered libraries: libalteracl.so was registered on the system at /opt/intel/inteloneapi/compiler/2021.1-beta03/linux/lib/oclfpga/host/linux64/lib libintelocl.so was registered on the system at /opt/intel/inteloneapi/compiler/latest/linux/lib/x64 libintelocl_emu.so was registered on the system at /opt/intel/inteloneapi/compiler/2021.1-beta03/linux/lib/oclfpga/host/linux64/lib libigdrcl.so was registered on the system at /opt/intel/inteloneapi/compiler/latest/linux/lib/oclgpu   Using the following location for fcd installations: /opt/Intel/OpenCLFPGA/oneAPI/Boards   Found 1 fcd entry at that location: /opt/Intel/OpenCLFPGA/oneAPI/Boards/dcp_bsp.fcd   the following OpenCL libraries are referenced in the fcd files: /opt/intel/inteloneapi/compiler/2021.1-beta03/linux/lib/oclfpga/board/intel_a10gx_pac/linux64/lib/libintel_opae_mmd.so   checking LD_LIBRARY_PATH for registered libraries: /opt/intel/inteloneapi/compiler/2021.1-beta03/linux/lib/oclfpga/board/intel_a10gx_pac/linux64/lib/libintel_opae_mmd.so was registered on the system.   Number of Platforms = 3 1. Intel(R) FPGA Emulation Platform for OpenCL(TM) | Intel(R) Corporation | OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 19.2 2. Intel(R) FPGA SDK for OpenCL(TM) | Intel(R) Corporation | OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 19.3api 3. Intel(R) OpenCL | Intel(R) Corporation | OpenCL 2.1 LINUX -------------------------------------------------------------------- ICD diagnostics PASSED -------------------------------------------------------------------- -------------------------------------------------------------------- BSP Diagnostics -------------------------------------------------------------------- -------------------------------------------------------------------- Device Name: acl0   BSP Install Location: /opt/intel/inteloneapi/compiler/2021.1-beta03/linux/lib/oclfpga/board/intel_a10gx_pac   Vendor: Intel Corp   Physical Dev Name Status Information   pac_ee00000 Passed Intel PAC Platform (pac_ee00000) PCIe 94:00.0 FPGA temperature = 44 degrees C.   DIAGNOSTIC_PASSED --------------------------------------------------------------------   Call "aocl diagnose <device-names>" to run diagnose for specified devices Call "aocl diagnose all" to run diagnose for all devices

 

PSath2
Beginner
143 Views

printenv output attached

Lawrence_L_Intel
Employee
143 Views

Hi Paul

Letting you know this on our list of things to look at. Note that for the last couple of days we had a problem with the license server. So if there is a step in your flow that is a Quartus FPGA compile under the hood that could have failed causing down stream problems. Can you verify you still are having issues?

Thanks

Larry

PSath2
Beginner
143 Views

We can compile OpenCL kernels fine on the fpga_compile nodes thanks to that update. We can also compile and run host codes on several of the "fpga" nodes. (So far 86 and 88 are known good, I haven't run into a situation where i wasn't able to get on one of the two to look at the others yet.)

 

Upon further testing, the issue noted above appears specific to s001-n084 (and when testing it this morning we are also not able to automatically find the OpenCL headers like we can on nodes 86 and 88). Perhaps it can be offlined until fixed so we can continue to queue for any node with the "fpga" property without landing on 84, rather than having to manually-pick the known-good nodes?

Lawrence_L_Intel
Employee
143 Views

Hi Paul

Can you try this setup script: /data/intel_fpga/devcloudLoginToolSetup.sh

 

Then do tools_setup

and select the devstack for Arria 10 or Stratix 10.

 

We havent been testing the oneapi version. Note kernel downloads are only on n137,n138 and n139 or n189.

 

Let me know if this works.

 

Thanks

Larry

 

PSath2
Beginner
143 Views

HI Larry,

 

Apologies for the delayed response, we had another critical deadline that drew my full attention the last week.

 

As far as the original forum topic goes, we were able to compile successfully on the main queue's fpga_compile nodes, and then able to run those aocx implementations on nodes 88 and 86 without any issues (using standard OpenCL C and C++ host codes).

 

As for the beta queue, I've got our .l implementations in the compile queue for s001-n137 using option #5, Arria 10 stack. We get the below error pretty early on, but it seems to be proceeding nonetheless, will keep you updated.

....

aoc: Selected default target board pac_a10

Inconsistency detected by ld.so: dl-close.c: 811: _dl_close: Assertion `map->l_init_called' failed!

aoc: Running OpenCL parser....

...

As an aside, I see with aocl diagnose that the 0th and 2nd PAC A10 devices show status "Passed" ,while the 1st (ac11) shows "Uninitialized". IFF we can get compiled and running on at least two cards in the beta queue, one of our test codes is currently having to swap two cl_programs back and forth due to BRAM constrains on the single-device nodes in the default queue and it would be an interesting data point to run it on two devices and only move the data back and forth rather than reconfiguring the Arrias :)

 

PSath2
Beginner
143 Views

Update:

 

All 3 kernel files grind for some time before eventually failing in a10_partial_reconfig/flow.tcl with the below error. (These builds were all performed on s001-n137)

Checking if memory usage is larger than 100% remove outer_zero_and_others.1.bc remove area_src.json remove loops.json remove summary.json remove lmv.json remove mav.json remove info.json remove warnings.json remove area.html remove area.json remove outer_zero_and_others.bc /glob/development-tools/versions/fpgasupportstack/a10/1.2/inteldevstack/intelFPGA_pro/hld/linux64/bin/system_integrator --bsp-flow green_top /glob/development-tools/versions/fpgasupportstack/a10/1.2/inteldevstack/a10_gx_pac_ias_1_2_pv/opencl/opencl_bsp/hardware/pac_a10/board_spec.xml "outer_zero_and_others.bc.xml" none kernel_system.tcl #aoc: First stage compilation completed successfully. Compiling for FPGA. This process may take a long time, please be patient. qsys-script --quartus-project=dcp --script=kernel_system.tcl -Xmx512M -XX:+UseSerialGC echo bash build/run.sh Error (213009): File name "output_files/afu_fit.green_region.pmsf" does not exist or can't be read Error: Quartus Prime Convert_programming_file was unsuccessful. 1 error, 0 warnings Error (23031): Evaluation of Tcl script a10_partial_reconfig/flow.tcl unsuccessful Error: Quartus Prime Shell was unsuccessful. 7 errors, 3092 warnings

 

MEIYAN_L_Intel
Employee
143 Views

Hi,

From the error:

Error (213009): File name "output_files/afu_fit.green_region.pmsf" does not exist or can't be read.

It is same error as https://forums.intel.com/s/question/0D50P00004ZMIykSAH/error-213009-file-name-outputfilesafufitgreen... which is license issue for Intel Acceleration Stack v1.2.

For this error happened due to the expired license for Ethernet IP.

I will check with developer for this issue in Devcloud.

Thanks

Lawrence_L_Intel
Employee
143 Views

Patch 1.03DCP is in place.

See githuib site with details https://github.com/intel/FPGA-Devcloud

source /data/intelfpga/devcloudLoginToolSetup.sh

 

The normal flow with the tools_setup should compile for you properly. /opt path not needed. Please confirm.

Thanks

Larry

agond2
Beginner
143 Views

Hi Larry,

Confirming that I was able to compile and run OpenCL kernels on nodes s001-n137,138 and 139. Thanks a lot for the patch.

On the node s001-189, which has stratix10 FPGA, I used tools_setup command and selected option 6 for Stratix 10 development stack, I get the error "Error: Compiler Error, not able to generate hardware"

 

-Atharva

Quartus_sh_compile.log on node s001-189:

This is the run.sh script. ERROR: packager tool failed to run. Check installation. Aborting compilation!

 

 

 

agond2
Beginner
143 Views

Hi Larry,

 

I was able to compile and run OpenCL vector add example on Stratix10 node (s001-n189) after the update

https://github.com/intel/FPGA-Devcloud/commit/8df1be2e0854cb06bb8b6fdf4cd1226d7cdc690c

 

Confirming that I was able to compile and run OpenCL kernels on both Stratix(189) and Arria 10 nodes(137-139)

Thanks for adding the patches.

-Atharva

MEIYAN_L_Intel
Employee
143 Views

Hi,

Thank you for your update.

Thanks