Intel® Xeon® Processor and Server Products
Intel® Xeon® Processors, Data Center Products including boards, integrated systems, and RAID Storage
5181 Discussions

GEN 5 Server Freeze after running cameras with DL streamer.

vijay223
Beginner
4,009 Views
Please check the details below. Below are the details of server which is from the client location.
 
Server froze after some time (Any time randomly). And while its froze the SSH also not working.
 
vijay223_0-1750410895077.png

 

 

vijay223_1-1750410916979.png

 

 

BIOS version - 2.50

OS Version - Ubuntu 24.04.02 LTS / Ubuntu 22.04 LTS / Archlinux  ( Tried All 3 OS)

 

Kernel Version - 6.8.0-53-generic / 5.15 / 6.7.6-arch-1 (Tried these kernels on each OS respectively)

 

Intel DL Streamer Version -  2023.03 / 2025.01 (Tried both)


OEM - HPE

HP Model - Server DL 380 GEN11 8 SFF NCCTO-HP

Processor Model - INTEL(R) XEON(R) GOLD 6548Y+

CPU-

Core : 128

Vendor ID: GenuineIntel

Model name: INTEL(R) XEON(R) GOLD 6548Y+

Model: 207

RAM - 512 GB

 
 
Please let us know if you have any questions. This is a production system.
 
After rebooting it comes to normal. 

We are using this server to run 50 cameras with a detection model on dlstreamer. We could not find any issues with application side.

And the OEM is saying no issues from their side. Not understanding what's the issue.
 
here are some logs 
 
 
And our VISION models are FP32. Not AMX based.
 
Same setup - Same iso with same docker image runs perfectly on Gen4 Machine perfectly.
 
Thanks,
Vijay
0 Kudos
8 Replies
vish1
Employee
3,965 Views

Hello vijay223,


Greetings!!


We would like to inform you that your case has been routed to the concerned team for further review.

They will be contacting you soon to assist you with the next steps.


Best Regards,

Vishal Shet P

Intel Customer Support Technician


0 Kudos
Wan_Intel
Moderator
3,918 Views

Hi vijay223,

Thank you for reaching out to us.


We will escalate this case to relevant team and we will provide an update here as soon as possible.



Regards,

Wan


0 Kudos
Wan_Intel
Moderator
3,673 Views

Hi vijay223,

Thank you for your patience. We have received feedback from relevant team.


For your information, we do not have access to a server setup with 50 cameras to replicate and test the exact scenario. Additionally, please note that the Intel® DL Streamer has reached End of Life (EOL), and there is no ongoing support or updates for it. Given these constraints, our ability to reproduce and debug this issue is limited. However, here is some information for you to consider when debugging the issue.


  • Check Hardware Thermals and Power Events: Even if OEM says the hardware is OK, high sustained workloads (like 50 concurrent video streams with inference) can trigger thermal throttling or power protection. You can also check journalctl -k and dmesg logs after reboot (look for: THERMAL, Watchdog, powercap, mce).


  • Enable and Check Watchdog Configuration: Freezing at the system level can also indicate a triggered watchdog, especially on HPE servers.
    1. Check whether a watchdog is enabled in BIOS or OS.
    2. If using systemd-based watchdog, monitor whether system is failing to ping the hardware watchdog timer.


  • NUMA and Memory Pressure: Running 50 inference pipelines can put stress on NUMA boundaries and memory management. You can run numactl --hardware to view NUMA topology or pinning Intel® DL Streamer pipeline threads across sockets.


  • GPU / iGPU / VPU Instability: Intel® DL Streamer often uses Intel® iGPU or Intel® VPU for offloading. If running on a platform with Intel® iGPUs or discrete Intel® VPUs, GPU-level errors (hangs, resource overcommit) could cause hard lock.
    1. Check /var/log/syslog or dmesg for signs of GPU hangs (i915, xe, gpu hang detected).
    2. Use intel_gpu_top or intel_gpu_frequency to monitor load.
    3. Try reducing the number of parallel streams (e.g., from 50 → 20) to check if the issue is load-induced.


  • Intel® DL Streamer Configuration and Offload Paths: If you're using GStreamer with inference plugins, consider:
    1. Use queue elements between heavy components to prevent pipeline blocking.
    2. Use vaapi only if the iGPU supports it and no resource leak exists.
    3. Monitor per-pipeline CPU and memory usage with ps or pidstat.



Hope this information helps.



Regards,

Wan



0 Kudos
vijay223
Beginner
3,526 Views

Hi @Wan_Intel ,

 

Thanks for the response, but below are the tests we have done

 

System hangs even with 2 cameras / 1 camera

We have tried hitting the 100 % cpu for couple of days with big matrix operations. We did not observe any issues for more than 4 days. this test has used AMX and AVX 12 completely.

 

This is happening when we are reading the actual cameras. 

No GPU / IGPU / VPU involved in this case.

We will check the bios setting.

please suggest ...


Thanks,

Vijay

 

0 Kudos
Wan_Intel
Moderator
3,516 Views

Hi vjay223,

Thank you for your information.


I will check with relevant team again, and we will provide an update here as soon as possible. Thank you.



Regards,

Wan


0 Kudos
Hari_B_Intel
Moderator
3,325 Views

Dear Vijay223,


Thank you for your continued patience.

I wanted to let you know that I’ve received the escalation regarding the system freeze issue when running DL Streamer on your Gen 5 server. We’re currently working to sort out access to hardware similar to your setup; however, at this moment, we do not have the exact server or camera configuration available on our end.

To assist our investigation, could you kindly provide a simplified set of steps to help us reproduce the issue, ideally in a minimal environment? Even if it can be replicated using a single camera or in a simulated setup without actual camera input, that would be very helpful for us to proceed.

Once we’re able to reliably reproduce the issue, we can further analyze and isolate the root cause more effectively.


Looking forward to your guidance on this. Please let us know if you need any clarification from our side.


0 Kudos
Wan_Intel
Moderator
3,053 Views

Hi vjay223,

Just wanted to follow up and see if you can provide a simplified set of steps to help us reproduce the issue, ideally in a minimal environment. Once we are able to reproduce the issue, we can further analyze and isolate the root cause.



Regards,

Wan


0 Kudos
Wan_Intel
Moderator
1,775 Views

Hi vjay223,

Thank you for your question.


Please submit a new question if you need additional information as this thread will no longer be monitored.



Regards,

Wan


0 Kudos
Reply