Sync benchmark app crashes on Elkhart lake - Page 3

PrateekSingh · ‎07-16-2024

I am experiencing repeated crashes with the Sync Benchmark application on an Elkhart Lake GPU (Works fine on CPU). Below are the details and steps to reproduce the issue, along with some debug logs.

I have changed this sync_benchmark to run from a default duration of 10 seconds to 100 seconds and I get the clWaitForEvents consistently.

Environment:

OS: Ubuntu 22.04
OpenVINO Version: 2024.2
Compute Runtime/OpenCL Version: 24.22.29735.20
Model Used: Person Detection Retail-0002

The Sync Benchmark application consistently hangs and eventually crashes when attempting to run inference on the specified model. The application is executed with the following command:

./sync_benchmark /home/test/intel/person-detection-retail-0002/FP16-INT8/person-detection-retail-0002.xml GPU

Kernel Messages (dmesg): Multiple entries of GPU hangs, for instance:

i915 xxxxx [drm] GPU HANG: ecode 11:1:8ed9fff3, in sync_benchmark [4524]

I have tried following i915 parameters (note that the error occurs with and without these flags):

i915.enable_hangcheck=0 
i915.request_timeout_ms=200000 
intel_idle.max_cstate=1 
i915.enable_dc=0 
ahci.mobile_lpm_policy=1 
i915.enable_psr2_sel_fetch=0 
i915.enable_psr=0 
vt.handoff=7

Any insights or solutions to address these crashes would be highly appreciated. I have attached the relevant outputs and configurations for reference.

Luis_at_Intel · ‎01-02-2025

Hi Prateek,

Thank you for your patience. I wanted to share we've tried running a longer test and still cannot observe the issue. We've just ran the benchmark_app with throughput mode (nstreams = 2, nireqs = 4) for 5M iterations and it completed successfully without error (test took ~8 days to complete). At this current load, the GPU remains 100% utilized.

If you have any other information that could help us reproduce the issue kindly share. I'd suggest to try the same OpenVINO version (2024.6) and compute-runtime version (24.35.30872.22) on your end and see if the issue resolves. Hope this helps.

$ benchmark_app -m intel/person-detection-retail-0002/FP16-INT8/person-detection-retail-0002.xml -d GPU -niter 5000000 -hint throughput

[Step 1/11] Parsing and validating input arguments

[ INFO ] Parsing input parameters

[Step 2/11] Loading OpenVINO Runtime

[ INFO ] OpenVINO:

[ INFO ] Build ................................. 2024.6.0-17404-4c0f47d2335-releases/2024/6

[ INFO ]

[ INFO ] Device info:

[ INFO ] GPU

[ INFO ] Build ................................. 2024.6.0-17404-4c0f47d2335-releases/2024/6

[ INFO ]

[...]

[Step 8/11] Querying optimal runtime parameters

[ INFO ] Model:

[ INFO ] NETWORK_NAME: PVANet + R-FCN

[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4

[ INFO ] PERF_COUNT: False

[ INFO ] ENABLE_CPU_PINNING: False

[ INFO ] MODEL_PRIORITY: Priority.MEDIUM

[ INFO ] GPU_HOST_TASK_PRIORITY: Priority.MEDIUM

[ INFO ] GPU_QUEUE_PRIORITY: Priority.MEDIUM

[ INFO ] GPU_QUEUE_THROTTLE: Priority.MEDIUM

[ INFO ] GPU_ENABLE_LOOP_UNROLLING: True

[ INFO ] GPU_DISABLE_WINOGRAD_CONVOLUTION: False

[ INFO ] CACHE_DIR:

[ INFO ] CACHE_MODE: CacheMode.OPTIMIZE_SPEED

[ INFO ] PERFORMANCE_HINT: PerformanceMode.THROUGHPUT

[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE

[ INFO ] COMPILATION_NUM_THREADS: 4

[ INFO ] NUM_STREAMS: 2

[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0

[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float16'>

[ INFO ] DYNAMIC_QUANTIZATION_GROUP_SIZE: 32

[ INFO ] ACTIVATIONS_SCALE_FACTOR: 0.0

[ INFO ] DEVICE_ID: 0

[ INFO ] EXECUTION_DEVICES: ['GPU.0']

[Step 9/11] Creating infer requests and preparing input tensors

[ WARNING ] No input files were given for input 'data'!. This input will be filled with random values!

[ WARNING ] No input files were given for input 'im_info'!. This input will be filled with random values!

[ INFO ] Fill input 'data' with random values

[ INFO ] Fill input 'im_info' with random values

[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 5000000 iterations)

[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).

[ INFO ] First inference took 194.33 ms

[Step 11/11] Dumping statistics report

[ INFO ] Execution Devices:['GPU.0']

[ INFO ] Count: 5000000 iterations

[ INFO ] Duration: 689364352.20 ms

[ INFO ] Latency:

[ INFO ] Median: 551.15 ms

[ INFO ] Average: 551.16 ms

[ INFO ] Min: 282.63 ms

[ INFO ] Max: 690.56 ms

[ INFO ] Throughput: 7.25 FPS

Luis_at_Intel · ‎01-13-2025

Hi Prateek,

It has been a while since we heard from you on this topic, I hope you were able to resolve the issue on your end based on the shared information. Note we are closing this issue and this thread will no longer be monitored. If you need any additional information from Intel, please submit a new question.

Best Regards,

Luis