- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am experiencing repeated crashes with the Sync Benchmark application on an Elkhart Lake GPU (Works fine on CPU). Below are the details and steps to reproduce the issue, along with some debug logs.
I have changed this sync_benchmark to run from a default duration of 10 seconds to 100 seconds and I get the clWaitForEvents consistently.
Environment:
- OS: Ubuntu 22.04
- OpenVINO Version: 2024.2
- Compute Runtime/OpenCL Version: 24.22.29735.20
- Model Used: Person Detection Retail-0002
The Sync Benchmark application consistently hangs and eventually crashes when attempting to run inference on the specified model. The application is executed with the following command:
./sync_benchmark /home/test/intel/person-detection-retail-0002/FP16-INT8/person-detection-retail-0002.xml GPU
Kernel Messages (dmesg): Multiple entries of GPU hangs, for instance:
i915 xxxxx [drm] GPU HANG: ecode 11:1:8ed9fff3, in sync_benchmark [4524]
I have tried following i915 parameters (note that the error occurs with and without these flags):
i915.enable_hangcheck=0
i915.request_timeout_ms=200000
intel_idle.max_cstate=1
i915.enable_dc=0
ahci.mobile_lpm_policy=1
i915.enable_psr2_sel_fetch=0
i915.enable_psr=0
vt.handoff=7
Any insights or solutions to address these crashes would be highly appreciated. I have attached the relevant outputs and configurations for reference.
Link Copied
- « Previous
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Prateek,
Thank you for your patience. I wanted to share we've tried running a longer test and still cannot observe the issue. We've just ran the benchmark_app with throughput mode (nstreams = 2, nireqs = 4) for 5M iterations and it completed successfully without error (test took ~8 days to complete). At this current load, the GPU remains 100% utilized.
If you have any other information that could help us reproduce the issue kindly share. I'd suggest to try the same OpenVINO version (2024.6) and compute-runtime version (24.35.30872.22) on your end and see if the issue resolves. Hope this helps.
$ benchmark_app -m intel/person-detection-retail-0002/FP16-INT8/person-detection-retail-0002.xml -d GPU -niter 5000000 -hint throughput
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.6.0-17404-4c0f47d2335-releases/2024/6
[ INFO ]
[ INFO ] Device info:
[ INFO ] GPU
[ INFO ] Build ................................. 2024.6.0-17404-4c0f47d2335-releases/2024/6
[ INFO ]
[ INFO ]
[...]
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: PVANet + R-FCN
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ] PERF_COUNT: False
[ INFO ] ENABLE_CPU_PINNING: False
[ INFO ] MODEL_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_HOST_TASK_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_QUEUE_PRIORITY: Priority.MEDIUM
[ INFO ] GPU_QUEUE_THROTTLE: Priority.MEDIUM
[ INFO ] GPU_ENABLE_LOOP_UNROLLING: True
[ INFO ] GPU_DISABLE_WINOGRAD_CONVOLUTION: False
[ INFO ] CACHE_DIR:
[ INFO ] CACHE_MODE: CacheMode.OPTIMIZE_SPEED
[ INFO ] PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ] EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ] COMPILATION_NUM_THREADS: 4
[ INFO ] NUM_STREAMS: 2
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
[ INFO ] INFERENCE_PRECISION_HINT: <Type: 'float16'>
[ INFO ] DYNAMIC_QUANTIZATION_GROUP_SIZE: 32
[ INFO ] ACTIVATIONS_SCALE_FACTOR: 0.0
[ INFO ] DEVICE_ID: 0
[ INFO ] EXECUTION_DEVICES: ['GPU.0']
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No input files were given for input 'data'!. This input will be filled with random values!
[ WARNING ] No input files were given for input 'im_info'!. This input will be filled with random values!
[ INFO ] Fill input 'data' with random values
[ INFO ] Fill input 'im_info' with random values
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests, limits: 5000000 iterations)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 194.33 ms
[Step 11/11] Dumping statistics report
[ INFO ] Execution Devices:['GPU.0']
[ INFO ] Count: 5000000 iterations
[ INFO ] Duration: 689364352.20 ms
[ INFO ] Latency:
[ INFO ] Median: 551.15 ms
[ INFO ] Average: 551.16 ms
[ INFO ] Min: 282.63 ms
[ INFO ] Max: 690.56 ms
[ INFO ] Throughput: 7.25 FPS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Prateek,
It has been a while since we heard from you on this topic, I hope you were able to resolve the issue on your end based on the shared information. Note we are closing this issue and this thread will no longer be monitored. If you need any additional information from Intel, please submit a new question.
Best Regards,
Luis

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
- Next »