Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, DLA, Software Stack, and Reference Designs
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
422 Discussions

benchmark and classification_sample apps hang on starting inference when running with -d HETERO:FPGA,CPU.

mkont1
Beginner
632 Views

PAC installed in Artesyn MC1600 chassis with Intel(R) Xeon(R) CPU D-1567 @ 2.10GHz running CentOS 7.5.

 

fpgainfo fme:

Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** FME ******// Object Id : 0xEF00000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6

 

 

Prior to running the inference, this bitsream was programmed:

aocl program acl0 /opt/intel/openvino/bitstreams/a10_dcp_bitstreams/2019R1_RC_FP11_ResNet_SqueezeNet_VGG.aocx

 

 

classification_sample and benchmark apps run without issue with target device set to CPU. Both applications hang when attempting run on the FPGA (with -d HETERO:FPGA,CPU). Inference on the FPGA usually complete successfully with a single iteration (-ni 1) but consistently hang with higher number of iterations.

 

# ./classification_sample -d HETERO:FPGA,CPU -ni 10 -i /opt/intel/openvino/deployment_tools/demo/car.png -m /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml [ INFO ] InferenceEngine: API version ............ 1.6 Build .................. custom_releases/2019/R1.1_28dfbfdd28954c4dfd2f94403dd8dfc1f411038b [ INFO ] Parsing input parameters [ INFO ] Files were added: 1 [ INFO ] /opt/intel/openvino/deployment_tools/demo/car.png [ INFO ] Loading plugin   API version ............ 1.6 Build .................. heteroPlugin Description ....... heteroPlugin [ INFO ] Loading network files: /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.bin [ INFO ] Preparing input blobs [ WARNING ] Image is resized from (787, 259) to (227, 227) [ INFO ] Batch size is 1 [ INFO ] Preparing output blobs [ INFO ] Loading model to the plugin [ INFO ] Starting inference (10 iterations)# ./benchmark_app -d HETERO:FPGA,CPU -i /opt/intel/openvino/deployment_tools/demo/car.png -m /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml [ INFO ] InferenceEngine: API version ............ 1.6 Build .................. custom_releases/2019/R1.1_28dfbfdd28954c4dfd2f94403dd8dfc1f411038b   [Step 1/8] Parsing and validation of input args [ INFO ] Parsing input parameters [ INFO ] Files were added: 1 [ INFO ] /opt/intel/openvino/deployment_tools/demo/car.png Progress: [....................] 100.00% done   [Step 2/8] Loading plugin [ INFO ] API version ............ 1.6 Build .................. heteroPlugin Description ....... heteroPlugin Progress: [....................] 100.00% done   [Step 3/8] Read IR network [ INFO ] Loading network files [ INFO ] Network batch size: 1, precision: FP32 Progress: [....................] 100.00% done   [Step 4/8] Configure input & output of the model [ INFO ] Preparing output blobs Progress: [....................] 100.00% done   [Step 5/8] Loading model to the plugin Progress: [....................] 100.00% done   [Step 6/8] Create infer requests and fill input blobs with images [ INFO ] Infer Request 0 created [ INFO ] Network Input dimensions (NCHW): 1 3 227 227 [ INFO ] Prepare image /opt/intel/openvino/deployment_tools/demo/car.png [ WARNING ] Image is resized from (787, 259) to (227, 227) [ INFO ] Infer Request 1 created [ INFO ] Network Input dimensions (NCHW): 1 3 227 227 [ INFO ] Prepare image /opt/intel/openvino/deployment_tools/demo/car.png [ WARNING ] Image is resized from (787, 259) to (227, 227) Progress: [....................] 100.00% done   [Step 7/8] Start inference asynchronously (120000.00 ms duration, 2 inference requests in parallel) Progress: [ ] 0.00% done

 

0 Kudos
7 Replies
JonWay_C_Intel
Employee
209 Views

Hi @mkont1​ 

 

Could you elaborate what "hangs" here means? Can you recover by Ctrl+C or you need to reboot?

 

As sanity check,

Does cold reset (power cycle) the server resolve the issue?

 

Upon every reboot/ new terminal:

Make sure that you have initialized the card.

Make sure that you have set the hugepages. Allocate 20, 2 MB hugepages per card.

 

Did the PAC pass the fpgabist? You may refer to below link (keyword "Running FPGA Diagnostics") 

https://www.intel.com/content/www/us/en/programmable/documentation/iyu1522005567196.html

 

Did the PAC pass the aocl diagnose acl0? You may refer to: https://www.intel.com/content/www/us/en/programmable/documentation/fvf1521490619217.html#zru15232937...

 

Could you run below? I want to check you have correct OPAE version.

rpm -qa | grep opae

 

Does this fail with 2019R1_RC_FP11_ResNet_SqueezeNet_VGG only or does it fail with other AOCX as well?

Could you try changing to use other aocx with lower FP?

 

In summary, test as I suggest above first:

reboot --> Initialize --> set hugepages --> fpgabist --> aocl diagnose acl0--> change other aocx --> change to lower FP.

 

If failure persist:

Please provide info of OS/kernel version and all the results you see from the above test.

 

cat /etc/*elease

uname -r

 

Thanks

mkont1
Beginner
209 Views

Can recover with Ctrl+C.

 

Issue persists after power cycle.

Hugepages set with:

sudo sh -c "echo 20 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"

 

Output of fpgabist:

# sudo fpgabist $OPAE_PLATFORM_ROOT/hw/samples/nlb_mode_3/bin/nlb_mode_3.gbs ==========================================================   Beginning FPGA Built-In Self-Test   ========================================================== Device: bus = 6, device = , func = Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: External reset Power-on-reset //****** FME ******// Object Id : 0xF000000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** PORT ******// Object Id : 0xEF00000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 Accelerator Id : 18b79ffa-2ee5-4aa0-96ef-4230dafacb5f Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** TEMP ******// Object Id : 0xF000000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 (11) FPGA Core TEMP : 58.00 °C (12) Board TEMP : 47.00 °C (14) QSFP TEMP : No reading (reading state unavailable) (15) Core Supply Temp : 65.28 °C Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** POWER ******// Object Id : 0xF000000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 ( 0) Total Input Power : 28.50 Watts ( 1) PCIe 12V Current : 2.47 Amps ( 2) PCIe 12V Voltage : 11.20 Volts ( 3) 1.2V Voltage : 1.22 Volts ( 4) 1.2V Current : 2.66 Amps ( 5) 1.8V Voltage : 1.83 Volts ( 6) 1.8V Current : 2.73 Amps ( 7) 3.3V Mgmt Voltage : 3.34 Volts ( 8) 3.3V Current : 0.54 Amps ( 9) FPGA Core Voltage : 0.91 Volts (10) FPGA Core Current : 13.11 Amps (13) QSFP P3V3 : No reading (reading state unavailable) (16) Core Supply Temp Input : 0.50 Volts (17) VCCR Voltage : 1.04 Volts (18) VCCT Voltage : 1.04 Volts (19) VCCR Current : 1.12 Amps (20) VCCT Current : 0.12 Amps (21) VPP Voltage : 2.53 Volts (22) VTT Voltage : 0.59 Volts Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** PORT ERRORS ******// Object Id : 0xEF00000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x30201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 Accelerator Id : 18b79ffa-2ee5-4aa0-96ef-4230dafacb5f First Error : 0x0 First Malformed Req : 0xFFFFFFFFFFFFFFFF Errors : 0x0 Board Management Controller, microcontroller FW version 26889 Last Power Down Cause: POK_CORE Last Reset Cause: None //****** FME ERRORS ******// Object Id : 0xF000000 PCIe s:b:d:f : 0000:06:00:0 Device Id : 0x09C4 Socket Id : 0x00 Ports Num : 01 Bitstream Id : 0x123000200000185 Bitstream Version : 0x7FFF00030201 Pr Interface Id : 69528db6-eb31-577a-8c36-68f9faa081f6 First Error : 0x0 Next Error : 0x0 Errors : 0x0 PCIe1 Errors : 0x0 Nonfatal Errors : 0x0 Inject Error : 0x0 Catfatal Errors : 0x0 PCIe0 Errors : 0x0 Running mode: nlb_3 Attempting Partial Reconfiguration: Reading bitstream Looking for slot Found slot Programming bitstream Writing bitstream Done Running fpgadiag read test...     Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss Eviction 'Clocks(@200 MHz)' Rd_Bandwidth Wr_Bandwidth 1024 544035292 0 0 0 0 0 0 1000011426 6.964 GB/s 0.000 GB/s   VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count 0 0 0 0 0 0   Running fpgadiag write test...     Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss Eviction 'Clocks(@200 MHz)' Rd_Bandwidth Wr_Bandwidth 1024 0 762732 0 0 0 0 0 1000018957 0.000 GB/s 0.010 GB/s   VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count 0 0 0 0 0 0   Running fpgadiag trput test...     Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss Eviction 'Clocks(@200 MHz)' Rd_Bandwidth Wr_Bandwidth 1024 488225340 489909832 0 0 0 0 0 1000023141 6.249 GB/s 6.271 GB/s   VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count 0 0 0 0 0 0   Finished Executing NLB (FPGA DIAG)Tests     Built-in Self-Test Completed.

aocl diagnose:

# aocl diagnose -------------------------------------------------------------------- Device Name: acl0   BSP Install Location: /root/intelrtestack/a10_gx_pac_ias_1_2_pv/opencl/opencl_bsp   Vendor: Intel Corp   Physical Dev Name Status Information   pac_a10_ef00000 Passed PAC Arria 10 Platform (pac_a10_ef00000) PCIe 06:00.0 FPGA temperature = 61 degrees C.   DIAGNOSTIC_PASSED --------------------------------------------------------------------   Call "aocl diagnose <device-names>" to run diagnose for specified devices Call "aocl diagnose all" to run diagnose for all devices

aocl diagnose acl0 gets stuck (recover with Ctrl+C)

# aocl diagnose acl0 Using platform: Intel(R) FPGA SDK for OpenCL(TM) Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_ef00000) Using Device from vendor: Intel Corp clGetDeviceInfo CL_DEVICE_GLOBAL_MEM_SIZE = 8589934592 clGetDeviceInfo CL_DEVICE_MAX_MEM_ALLOC_SIZE = 8589934592 Allocated 8589934592 bytes Actual maximum buffer size = 8589934592 bytes Writing 8192 MB to global memory ... Allocated 1073741824 Bytes host buffer for large transfers Write speed: 6917.17 MB/s [6912.93 -> 6919.78] Reading and verifying 8192 MB from global memory ... Read speed: 6648.18 MB/s [6541.27 -> 6688.25] Successfully wrote and readback 8192 MB buffer   Poll(interrupt) timeout

rpm -qa | grep opae:

# rpm -qa | grep opae opae-libs-1.1.2-1.x86_64 opae-tools-1.1.2-1.x86_64 opae-intel-fpga-driver-1.1.2-1.x86_64 opae-tools-extra-1.1.2-1.x86_64 opae-devel-1.1.2-1.x86_64 opae-ase-1.1.2-1.x86_64

OS and kernel versions:

# cat /etc/*elease   Board : PCIECARD Release : Distro OS Version : 2.0.2 Build-Date : 24 January 2019 Kernel-Arch : x86_64 Linux-Distribution : CentOS.7.5.1804 CentOS Linux release 7.5.1804 (Core) NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"   CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"   Board : PCIECARD Release : PCIe Manager Version : 2.0.2 Build-Date : 11 December 2018 Kernel-Arch : x86_64 Kernel-Version : 3.10.0-862.11.6.1.el7 Linux-Distribution : CentOS.7.5.1804 CentOS Linux release 7.5.1804 (Core) CentOS Linux release 7.5.1804 (Core)   # uname -r 3.10.0-862.11.6.1.el7.x86_64

Issue persists with 2019R1_RC_FP16_ResNet_SqueezeNet_VGG.aocx. I don't have an aocx with lower FP than 11.

 

 

 

JonWay_C_Intel
Employee
209 Views

Hi @mkont1​ 

 

Would you perform a quick test:

The demo cannot run the default batch size when running with FPGA. Need to make the changes on the batch size to more than 1. (eg. -b 10).

mkont1
Beginner
209 Views

Tried this with the benchmark_app. It didn't help.

JonWay_C_Intel
Employee
209 Views

Hi @mkont1​ 

 

Would you try -b10 AND -niter 100?

mkont1
Beginner
209 Views

Hi @JwChin​ 

It seems better with "- b 10 -niter 100" but still gets stuck. Most of the time the run gets stuck below 10% done. On one run it got up to 78% done and then got stuck.

 

[Step 7/8] Start inference asynchronously (100 async inference executions, 2 inference requests in parallel)

Progress: [.                  ] 7.92% done

JonWay_C_Intel
Employee
209 Views

Hi @mkont1​ 

 

I have sent you a private message.

Reply