- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
PAC installed in Artesyn MC1600 chassis with Intel(R) Xeon(R) CPU D-1567 @ 2.10GHz running CentOS 7.5.
fpgainfo fme:
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** FME ******//
Object Id : 0xEF00000
PCIe s:b:d:f : 0000:06:00:0
Device Id : 0x09C4
Socket Id : 0x00
Ports Num : 01
Bitstream Id : 0x123000200000185
Bitstream Version : 0x30201
Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6
Prior to running the inference, this bitsream was programmed:
aocl program acl0 /opt/intel/openvino/bitstreams/a10_dcp_bitstreams/2019R1_RC_FP11_ResNet_SqueezeNet_VGG.aocx
classification_sample and benchmark apps run without issue with target device set to CPU. Both applications hang when attempting run on the FPGA (with -d HETERO:FPGA,CPU). Inference on the FPGA usually complete successfully with a single iteration (-ni 1) but consistently hang with higher number of iterations.
# ./classification_sample -d HETERO:FPGA,CPU -ni 10 -i /opt/intel/openvino/deployment_tools/demo/car.png -m /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml
[ INFO ] InferenceEngine:
API version ............ 1.6
Build .................. custom_releases/2019/R1.1_28dfbfdd28954c4dfd2f94403dd8dfc1f411038b
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ] /opt/intel/openvino/deployment_tools/demo/car.png
[ INFO ] Loading plugin
API version ............ 1.6
Build .................. heteroPlugin
Description ....... heteroPlugin
[ INFO ] Loading network files:
/root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml
/root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.bin
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (10 iterations)
# ./benchmark_app -d HETERO:FPGA,CPU -i /opt/intel/openvino/deployment_tools/demo/car.png -m /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml
[ INFO ] InferenceEngine:
API version ............ 1.6
Build .................. custom_releases/2019/R1.1_28dfbfdd28954c4dfd2f94403dd8dfc1f411038b
[Step 1/8] Parsing and validation of input args
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ] /opt/intel/openvino/deployment_tools/demo/car.png
Progress: [....................] 100.00% done
[Step 2/8] Loading plugin
[ INFO ]
API version ............ 1.6
Build .................. heteroPlugin
Description ....... heteroPlugin
Progress: [....................] 100.00% done
[Step 3/8] Read IR network
[ INFO ] Loading network files
[ INFO ] Network batch size: 1, precision: FP32
Progress: [....................] 100.00% done
[Step 4/8] Configure input & output of the model
[ INFO ] Preparing output blobs
Progress: [....................] 100.00% done
[Step 5/8] Loading model to the plugin
Progress: [....................] 100.00% done
[Step 6/8] Create infer requests and fill input blobs with images
[ INFO ] Infer Request 0 created
[ INFO ] Network Input dimensions (NCHW): 1 3 227 227
[ INFO ] Prepare image /opt/intel/openvino/deployment_tools/demo/car.png
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Infer Request 1 created
[ INFO ] Network Input dimensions (NCHW): 1 3 227 227
[ INFO ] Prepare image /opt/intel/openvino/deployment_tools/demo/car.png
[ WARNING ] Image is resized from (787, 259) to (227, 227)
Progress: [....................] 100.00% done
[Step 7/8]
Start inference asynchronously (120000.00 ms duration, 2 inference requests in parallel)
Progress: [ ] 0.00% done
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @mkont1
Could you elaborate what "hangs" here means? Can you recover by Ctrl+C or you need to reboot?
As sanity check,
Does cold reset (power cycle) the server resolve the issue?
Upon every reboot/ new terminal:
Make sure that you have initialized the card.
Make sure that you have set the hugepages. Allocate 20, 2 MB hugepages per card.
Did the PAC pass the fpgabist? You may refer to below link (keyword "Running FPGA Diagnostics")
https://www.intel.com/content/www/us/en/programmable/documentation/iyu1522005567196.html
Did the PAC pass the aocl diagnose acl0? You may refer to: https://www.intel.com/content/www/us/en/programmable/documentation/fvf1521490619217.html#zru1523293789016
Could you run below? I want to check you have correct OPAE version.
rpm -qa | grep opae
Does this fail with 2019R1_RC_FP11_ResNet_SqueezeNet_VGG only or does it fail with other AOCX as well?
Could you try changing to use other aocx with lower FP?
In summary, test as I suggest above first:
reboot --> Initialize --> set hugepages --> fpgabist --> aocl diagnose acl0--> change other aocx --> change to lower FP.
If failure persist:
Please provide info of OS/kernel version and all the results you see from the above test.
cat /etc/*elease
uname -r
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can recover with Ctrl+C.
Issue persists after power cycle.
Hugepages set with:
sudo sh -c "echo 20 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"
Output of fpgabist:
# sudo fpgabist $OPAE_PLATFORM_ROOT/hw/samples/nlb_mode_3/bin/nlb_mode_3.gbs
==========================================================
Beginning FPGA Built-In Self-Test
==========================================================
Device: bus = 6, device = , func =
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: External reset
Power-on-reset
//****** FME ******//
Object Id : 0xF000000
PCIe s:b:d:f : 0000:06:00:0
Device Id : 0x09C4
Socket Id : 0x00
Ports Num : 01
Bitstream Id : 0x123000200000185
Bitstream Version : 0x30201
Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** PORT ******//
Object Id : 0xEF00000
PCIe s:b:d:f : 0000:06:00:0
Device Id : 0x09C4
Socket Id : 0x00
Ports Num : 01
Bitstream Id : 0x123000200000185
Bitstream Version : 0x30201
Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6
Accelerator Id : 18b79ffa-2ee5-4aa0-96ef-4230dafacb5f
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** TEMP ******//
Object Id : 0xF000000
PCIe s:b:d:f : 0000:06:00:0
Device Id : 0x09C4
Socket Id : 0x00
Ports Num : 01
Bitstream Id : 0x123000200000185
Bitstream Version : 0x30201
Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6
(11) FPGA Core TEMP : 58.00 °C
(12) Board TEMP : 47.00 °C
(14) QSFP TEMP : No reading (reading state unavailable)
(15) Core Supply Temp : 65.28 °C
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** POWER ******//
Object Id : 0xF000000
PCIe s:b:d:f : 0000:06:00:0
Device Id : 0x09C4
Socket Id : 0x00
Ports Num : 01
Bitstream Id : 0x123000200000185
Bitstream Version : 0x30201
Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6
( 0) Total Input Power : 28.50 Watts
( 1) PCIe 12V Current : 2.47 Amps
( 2) PCIe 12V Voltage : 11.20 Volts
( 3) 1.2V Voltage : 1.22 Volts
( 4) 1.2V Current : 2.66 Amps
( 5) 1.8V Voltage : 1.83 Volts
( 6) 1.8V Current : 2.73 Amps
( 7) 3.3V Mgmt Voltage : 3.34 Volts
( 8) 3.3V Current : 0.54 Amps
( 9) FPGA Core Voltage : 0.91 Volts
(10) FPGA Core Current : 13.11 Amps
(13) QSFP P3V3 : No reading (reading state unavailable)
(16) Core Supply Temp Input : 0.50 Volts
(17) VCCR Voltage : 1.04 Volts
(18) VCCT Voltage : 1.04 Volts
(19) VCCR Current : 1.12 Amps
(20) VCCT Current : 0.12 Amps
(21) VPP Voltage : 2.53 Volts
(22) VTT Voltage : 0.59 Volts
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** PORT ERRORS ******//
Object Id : 0xEF00000
PCIe s:b:d:f : 0000:06:00:0
Device Id : 0x09C4
Socket Id : 0x00
Ports Num : 01
Bitstream Id : 0x123000200000185
Bitstream Version : 0x30201
Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6
Accelerator Id : 18b79ffa-2ee5-4aa0-96ef-4230dafacb5f
First Error : 0x0
First Malformed Req : 0xFFFFFFFFFFFFFFFF
Errors : 0x0
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** FME ERRORS ******//
Object Id : 0xF000000
PCIe s:b:d:f : 0000:06:00:0
Device Id : 0x09C4
Socket Id : 0x00
Ports Num : 01
Bitstream Id : 0x123000200000185
Bitstream Version : 0x7FFF00030201
Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6
First Error : 0x0
Next Error : 0x0
Errors : 0x0
PCIe1 Errors : 0x0
Nonfatal Errors : 0x0
Inject Error : 0x0
Catfatal Errors : 0x0
PCIe0 Errors : 0x0
Running mode: nlb_3
Attempting Partial Reconfiguration:
Reading bitstream
Looking for slot
Found slot
Programming bitstream
Writing bitstream
Done
Running fpgadiag read test...
Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss Eviction 'Clocks(@200 MHz)' Rd_Bandwidth Wr_Bandwidth
1024 544035292 0 0 0 0 0 0 1000011426 6.964 GB/s 0.000 GB/s
VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count
0 0 0 0 0 0
Running fpgadiag write test...
Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss Eviction 'Clocks(@200 MHz)' Rd_Bandwidth Wr_Bandwidth
1024 0 762732 0 0 0 0 0 1000018957 0.000 GB/s 0.010 GB/s
VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count
0 0 0 0 0 0
Running fpgadiag trput test...
Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss Eviction 'Clocks(@200 MHz)' Rd_Bandwidth Wr_Bandwidth
1024 488225340 489909832 0 0 0 0 0 1000023141 6.249 GB/s 6.271 GB/s
VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count
0 0 0 0 0 0
Finished Executing NLB (FPGA DIAG)Tests
Built-in Self-Test Completed.
aocl diagnose:
# aocl diagnose
--------------------------------------------------------------------
Device Name:
acl0
BSP Install Location:
/root/intelrtestack/a10_gx_pac_ias_1_2_pv/opencl/opencl_bsp
Vendor: Intel Corp
Physical Dev Name Status Information
pac_a10_ef00000 Passed PAC Arria 10 Platform (pac_a10_ef00000)
PCIe 06:00.0
FPGA temperature = 61 degrees C.
DIAGNOSTIC_PASSED
--------------------------------------------------------------------
Call "aocl diagnose <device-names>" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devices
aocl diagnose acl0 gets stuck (recover with Ctrl+C)
# aocl diagnose acl0
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_ef00000)
Using Device from vendor: Intel Corp
clGetDeviceInfo CL_DEVICE_GLOBAL_MEM_SIZE = 8589934592
clGetDeviceInfo CL_DEVICE_MAX_MEM_ALLOC_SIZE = 8589934592
Allocated 8589934592 bytes
Actual maximum buffer size = 8589934592 bytes
Writing 8192 MB to global memory ...
Allocated 1073741824 Bytes host buffer for large transfers
Write speed: 6917.17 MB/s [6912.93 -> 6919.78]
Reading and verifying 8192 MB from global memory ...
Read speed: 6648.18 MB/s [6541.27 -> 6688.25]
Successfully wrote and readback 8192 MB buffer
Poll(interrupt) timeout
rpm -qa | grep opae:
# rpm -qa | grep opae
opae-libs-1.1.2-1.x86_64
opae-tools-1.1.2-1.x86_64
opae-intel-fpga-driver-1.1.2-1.x86_64
opae-tools-extra-1.1.2-1.x86_64
opae-devel-1.1.2-1.x86_64
opae-ase-1.1.2-1.x86_64
OS and kernel versions:
# cat /etc/*elease
Board : PCIECARD
Release : Distro OS
Version : 2.0.2
Build-Date : 24 January 2019
Kernel-Arch : x86_64
Linux-Distribution : CentOS.7.5.1804
CentOS Linux release 7.5.1804 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
Board : PCIECARD
Release : PCIe Manager
Version : 2.0.2
Build-Date : 11 December 2018
Kernel-Arch : x86_64
Kernel-Version : 3.10.0-862.11.6.1.el7
Linux-Distribution : CentOS.7.5.1804
CentOS Linux release 7.5.1804 (Core)
CentOS Linux release 7.5.1804 (Core)
# uname -r
3.10.0-862.11.6.1.el7.x86_64
Issue persists with 2019R1_RC_FP16_ResNet_SqueezeNet_VGG.aocx. I don't have an aocx with lower FP than 11.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @mkont1
Would you perform a quick test:
The demo cannot run the default batch size when running with FPGA. Need to make the changes on the batch size to more than 1. (eg. -b 10).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tried this with the benchmark_app. It didn't help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @JwChin
It seems better with "- b 10 -niter 100" but still gets stuck. Most of the time the run gets stuck below 10% done. On one run it got up to 78% done and then got stuck.
[Step 7/8] Start inference asynchronously (100 async inference executions, 2 inference requests in parallel)
Progress: [. ] 7.92% done
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page