Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, FPGA AI Suite, Software Stack, and Reference Designs
477 Discussions

benchmark and classification_sample apps hang on starting inference when running with -d HETERO:FPGA,CPU.

mkont1
Beginner
1,300 Views

PAC installed in Artesyn MC1600 chassis with Intel(R) Xeon(R) CPU D-1567 @ 2.10GHz running CentOS 7.5.

 

fpgainfo fme:

Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** FME ******// Object Id : 0xEF00000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6

 

 

Prior to running the inference, this bitsream was programmed:

aocl program acl0 /opt/intel/openvino/bitstreams/a10_dcp_bitstreams/2019R1_RC_FP11_ResNet_SqueezeNet_VGG.aocx

 

 

classification_sample and benchmark apps run without issue with target device set to CPU. Both applications hang when attempting run on the FPGA (with -d HETERO:FPGA,CPU). Inference on the FPGA usually complete successfully with a single iteration (-ni 1) but consistently hang with higher number of iterations.

 

# ./classification_sample -d HETERO:FPGA,CPU -ni 10 -i /opt/intel/openvino/deployment_tools/demo/car.png -m /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml [ INFO ] InferenceEngine: API version ............ 1.6 Build .................. custom_releases/2019/R1.1_28dfbfdd28954c4dfd2f94403dd8dfc1f411038b [ INFO ] Parsing input parameters [ INFO ] Files were added: 1 [ INFO ] /opt/intel/openvino/deployment_tools/demo/car.png [ INFO ] Loading plugin   API version ............ 1.6 Build .................. heteroPlugin Description ....... heteroPlugin [ INFO ] Loading network files: /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.bin [ INFO ] Preparing input blobs [ WARNING ] Image is resized from (787, 259) to (227, 227) [ INFO ] Batch size is 1 [ INFO ] Preparing output blobs [ INFO ] Loading model to the plugin [ INFO ] Starting inference (10 iterations)# ./benchmark_app -d HETERO:FPGA,CPU -i /opt/intel/openvino/deployment_tools/demo/car.png -m /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml [ INFO ] InferenceEngine: API version ............ 1.6 Build .................. custom_releases/2019/R1.1_28dfbfdd28954c4dfd2f94403dd8dfc1f411038b   [Step 1/8] Parsing and validation of input args [ INFO ] Parsing input parameters [ INFO ] Files were added: 1 [ INFO ] /opt/intel/openvino/deployment_tools/demo/car.png Progress: [....................] 100.00% done   [Step 2/8] Loading plugin [ INFO ] API version ............ 1.6 Build .................. heteroPlugin Description ....... heteroPlugin Progress: [....................] 100.00% done   [Step 3/8] Read IR network [ INFO ] Loading network files [ INFO ] Network batch size: 1, precision: FP32 Progress: [....................] 100.00% done   [Step 4/8] Configure input & output of the model [ INFO ] Preparing output blobs Progress: [....................] 100.00% done   [Step 5/8] Loading model to the plugin Progress: [....................] 100.00% done   [Step 6/8] Create infer requests and fill input blobs with images [ INFO ] Infer Request 0 created [ INFO ] Network Input dimensions (NCHW): 1 3 227 227 [ INFO ] Prepare image /opt/intel/openvino/deployment_tools/demo/car.png [ WARNING ] Image is resized from (787, 259) to (227, 227) [ INFO ] Infer Request 1 created [ INFO ] Network Input dimensions (NCHW): 1 3 227 227 [ INFO ] Prepare image /opt/intel/openvino/deployment_tools/demo/car.png [ WARNING ] Image is resized from (787, 259) to (227, 227) Progress: [....................] 100.00% done   [Step 7/8] Start inference asynchronously (120000.00 ms duration, 2 inference requests in parallel) Progress: [ ] 0.00% done

 

0 Kudos
7 Replies
JonWay_C_Intel
Employee
877 Views

Hi @mkont1​ 

 

Could you elaborate what "hangs" here means? Can you recover by Ctrl+C or you need to reboot?

 

As sanity check,

Does cold reset (power cycle) the server resolve the issue?

 

Upon every reboot/ new terminal:

Make sure that you have initialized the card.

Make sure that you have set the hugepages. Allocate 20, 2 MB hugepages per card.

 

Did the PAC pass the fpgabist? You may refer to below link (keyword "Running FPGA Diagnostics") 

https://www.intel.com/content/www/us/en/programmable/documentation/iyu1522005567196.html

 

Did the PAC pass the aocl diagnose acl0? You may refer to: https://www.intel.com/content/www/us/en/programmable/documentation/fvf1521490619217.html#zru1523293789016

 

Could you run below? I want to check you have correct OPAE version.

rpm -qa | grep opae

 

Does this fail with 2019R1_RC_FP11_ResNet_SqueezeNet_VGG only or does it fail with other AOCX as well?

Could you try changing to use other aocx with lower FP?

 

In summary, test as I suggest above first:

reboot --> Initialize --> set hugepages --> fpgabist --> aocl diagnose acl0--> change other aocx --> change to lower FP.

 

If failure persist:

Please provide info of OS/kernel version and all the results you see from the above test.

 

cat /etc/*elease

uname -r

 

Thanks

0 Kudos
mkont1
Beginner
877 Views

Can recover with Ctrl+C.

 

Issue persists after power cycle.

Hugepages set with:

sudo sh -c "echo 20 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"

 

Output of fpgabist:

# sudo fpgabist $OPAE_PLATFORM_ROOT/hw/samples/nlb_mode_3/bin/nlb_mode_3.gbs ==========================================================   Beginning FPGA Built-In Self-Test   ========================================================== Device: bus = 6, device = , func = Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: External reset Power-on-reset //****** FME ******// Object Id : 0xF000000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** PORT ******// Object Id : 0xEF00000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 Accelerator Id : 18b79ffa-2ee5-4aa0-96ef-4230dafacb5f Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** TEMP ******// Object Id : 0xF000000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 (11) FPGA Core TEMP : 58.00 °C (12) Board TEMP : 47.00 °C (14) QSFP TEMP : No reading (reading state unavailable) (15) Core Supply Temp : 65.28 °C Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** POWER ******// Object Id : 0xF000000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 ( 0) Total Input Power : 28.50 Watts ( 1) PCIe 12V Current : 2.47 Amps ( 2) PCIe 12V Voltage : 11.20 Volts ( 3) 1.2V Voltage : 1.22 Volts ( 4) 1.2V Current : 2.66 Amps ( 5) 1.8V Voltage : 1.83 Volts ( 6) 1.8V Current : 2.73 Amps ( 7) 3.3V Mgmt Voltage : 3.34 Volts ( 8) 3.3V Current : 0.54 Amps ( 9) FPGA Core Voltage : 0.91 Volts (10) FPGA Core Current : 13.11 Amps (13) QSFP P3V3 : No reading (reading state unavailable) (16) Core Supply Temp Input : 0.50 Volts (17) VCCR Voltage : 1.04 Volts (18) VCCT Voltage : 1.04 Volts (19) VCCR Current : 1.12 Amps (20) VCCT Current : 0.12 Amps (21) VPP Voltage : 2.53 Volts (22) VTT Voltage : 0.59 Volts Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** PORT ERRORS ******// Object Id : 0xEF00000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 Accelerator Id : 18b79ffa-2ee5-4aa0-96ef-4230dafacb5f First Error : 0x0 First Malformed Req : 0xFFFFFFFFFFFFFFFF Errors : 0x0 Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** FME ERRORS ******// Object Id : 0xF000000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x7FFF00030201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 First Error : 0x0 Next Error : 0x0 Errors : 0x0 PCIe1 Errors : 0x0 Nonfatal Errors : 0x0 Inject Error : 0x0 Catfatal Errors : 0x0 PCIe0 Errors : 0x0 Running mode: nlb_3 Attempting Partial Reconfiguration: Reading bitstream Looking for slot Found slot Programming bitstream Writing bitstream Done Running fpgadiag read test...     Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss Eviction 'Clocks(@200 MHz)' Rd_Bandwidth Wr_Bandwidth 1024 544035292 0 0 0 0 0 0 1000011426 6.964 GB/s 0.000 GB/s   VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count 0 0 0 0 0 0   Running fpgadiag write test...     Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss Eviction 'Clocks(@200 MHz)' Rd_Bandwidth Wr_Bandwidth 1024 0 762732 0 0 0 0 0 1000018957 0.000 GB/s 0.010 GB/s   VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count 0 0 0 0 0 0   Running fpgadiag trput test...     Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss Eviction 'Clocks(@200 MHz)' Rd_Bandwidth Wr_Bandwidth 1024 488225340 489909832 0 0 0 0 0 1000023141 6.249 GB/s 6.271 GB/s   VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count 0 0 0 0 0 0   Finished Executing NLB (FPGA DIAG)Tests     Built-in Self-Test Completed.

aocl diagnose:

# aocl diagnose -------------------------------------------------------------------- Device Name: acl0   BSP Install Location: /root/intelrtestack/a10_gx_pac_ias_1_2_pv/opencl/opencl_bsp   Vendor: Intel Corp   Physical Dev Name Status Information   pac_a10_ef00000 Passed PAC Arria 10 Platform (pac_a10_ef00000) PCIe 06:00.0 FPGA temperature = 61 degrees C.   DIAGNOSTIC_PASSED --------------------------------------------------------------------   Call "aocl diagnose <device-names>" to run diagnose for specified devices Call "aocl diagnose all" to run diagnose for all devices

aocl diagnose acl0 gets stuck (recover with Ctrl+C)

# aocl diagnose acl0 Using platform: Intel(R) FPGA SDK for OpenCL(TM) Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_ef00000) Using Device from vendor: Intel Corp clGetDeviceInfo CL_DEVICE_GLOBAL_MEM_SIZE = 8589934592 clGetDeviceInfo CL_DEVICE_MAX_MEM_ALLOC_SIZE = 8589934592 Allocated 8589934592 bytes Actual maximum buffer size = 8589934592 bytes Writing 8192 MB to global memory ... Allocated 1073741824 Bytes host buffer for large transfers Write speed: 6917.17 MB/s [6912.93 -> 6919.78] Reading and verifying 8192 MB from global memory ... Read speed: 6648.18 MB/s [6541.27 -> 6688.25] Successfully wrote and readback 8192 MB buffer   Poll(interrupt) timeout

rpm -qa | grep opae:

# rpm -qa | grep opae opae-libs-1.1.2-1.x86_64 opae-tools-1.1.2-1.x86_64 opae-intel-fpga-driver-1.1.2-1.x86_64 opae-tools-extra-1.1.2-1.x86_64 opae-devel-1.1.2-1.x86_64 opae-ase-1.1.2-1.x86_64

OS and kernel versions:

# cat /etc/*elease   Board : PCIECARD Release : Distro OS Version : 2.0.2 Build-Date : 24 January 2019 Kernel-Arch : x86_64 Linux-Distribution : CentOS.7.5.1804 CentOS Linux release 7.5.1804 (Core) NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"   CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"   Board : PCIECARD Release : PCIe Manager Version : 2.0.2 Build-Date : 11 December 2018 Kernel-Arch : x86_64 Kernel-Version : 3.10.0-862.11.6.1.el7 Linux-Distribution : CentOS.7.5.1804 CentOS Linux release 7.5.1804 (Core) CentOS Linux release 7.5.1804 (Core)   # uname -r 3.10.0-862.11.6.1.el7.x86_64

Issue persists with 2019R1_RC_FP16_ResNet_SqueezeNet_VGG.aocx. I don't have an aocx with lower FP than 11.

 

 

 

0 Kudos
JonWay_C_Intel
Employee
877 Views

Hi @mkont1​ 

 

Would you perform a quick test:

The demo cannot run the default batch size when running with FPGA. Need to make the changes on the batch size to more than 1. (eg. -b 10).

0 Kudos
mkont1
Beginner
877 Views

Tried this with the benchmark_app. It didn't help.

0 Kudos
JonWay_C_Intel
Employee
877 Views

Hi @mkont1​ 

 

Would you try -b10 AND -niter 100?

0 Kudos
mkont1
Beginner
877 Views

Hi @JwChin​ 

It seems better with "- b 10 -niter 100" but still gets stuck. Most of the time the run gets stuck below 10% done. On one run it got up to 78% done and then got stuck.

 

[Step 7/8] Start inference asynchronously (100 async inference executions, 2 inference requests in parallel)

Progress: [.                  ] 7.92% done

0 Kudos
JonWay_C_Intel
Employee
877 Views

Hi @mkont1​ 

 

I have sent you a private message.

0 Kudos
Reply