How to build efficient visual analytics pipelines leveraging iGPU and free-up your CPU
Author: Vibhu Bithar, Lead Platform Architect, Health & Cities Division, Intel
Date: November 2025
The Challenge of Real-Time Video Analytics
Picture this: you’ve got multiple live RTSP camera feed streaming 24/7, and you need to detect objects in real-time. Every frame counts and your CPU is running hot and maxed out trying to keep up.
Meanwhile, your integrated GPU (iGPU) is barely breaking a sweat. Why? Because decoding, preprocessing, and inference are often handled inefficiently on the CPU alone, leaving the iGPU underutilized.
That’s where Intel® Deep Learning Streamer (DL Streamer) comes in. It’s a powerful Framework designed to orchestrate optimized pipelines for real-time computer vision analytics leveraging all the components of your processor including iGPU, NPU and dedicated HW decoders in your processor. This transfers the specialized heavy lifting required for AI to iGPU so your CPU stays available for other tasks.
What Is Intel® Deep Learning Streamer (DL Streamer)?
DL Streamer is an open-source, GStreamer-based framework for building real-time video analytics applications on Intel® hardware.
It simplifies the complex tasks of video decoding, inference, and post-processing into modular pipeline components. DL Streamer integrates tightly with OpenVINO™ Toolkit and supports hardware-accelerated video decode/encode via VA-API, preprocessing via OpenCV / DPC ++, and optimized inferencing on Intel® CPUs, iGPUs, xPUs, and NPUs. (GitHub)
Key Benefits
- Hardware acceleration: Optimized for Intel® Core™, Xeon®, Arc™, and Data Center GPU Flex Series devices1
- Seamless integration: Works with the OpenVINO™ toolkit for efficient inference
- Scalable design: Supports multiple simultaneous RTSP feeds
- Flexible architecture: Modular GStreamer elements for custom AI pipelines
In short, DL Streamer helps developers focus on outcomes, not boilerplate code.
Why DL Streamer Matters
Modern AI workloads demand real-time performance, low latency, and efficient hardware utilization, especially at the edge. DL helps achieve this by moving decoding preprocessing, and inference workloads onto the iGPU.
Why You Should Care
- Efficiency: Offload CPU-heavy tasks to iGPU
- Performance: Reduce CPU usage and memory transfers
- Simplicity: Build complex pipelines with just a few GStreamer elements
- Scalability: Support multiple camera feeds with minimal tuning
The Common Problem: CPU Overload Despite Having an iGPU
Let’s explore how a simple RTSP object detection pipeline evolves, from CPU-bound to fully iGPU accelerated.
Pipeline 1: CPU-Only (Baseline)
rtspsrc location=rtsp://XX.XX.XX.XX:554/yourcam1 ! decodebin ! videorate ! videoconvert ! video/x-raw,format=BGR,framerate=20/1 ! videoscale ! video/x-raw,width=640,height=480 ! gvadetect device=CPU model-instance-id=detect1 inference-interval=1 model=/home/models/object_detection/ITS_CL_FP16/openvino.xml ! autovideosink=trueIn this version, the entire decoding and inference workload runs on the CPU. It works, but your CPU does everything. Expect high utilization and latency.
Pipeline 2: Inference on GPU
rtspsrc location=rtsp://XX.XX.XX.XX:554/yourcam1 latency=15 ! decodebin ! videorate ! videoconvert ! video/x-raw,format=BGR,framerate=20/1 ! videoscale ! video/x-raw,width=640,height=480 ! gvadetect device=GPU model-instance-id=detect1 inference-interval=1 model=/home/models/object_detection/ITS_CL_FP16/openvino.xml ! autovideosink=trueInferencing now runs on the iGPU, but decoding still runs on the CPU. This creates unnecessary back-and-forth between CPU and GPU memory, reducing the performance gains you’d expect.
Pipeline 3: Full GPU Acceleration (Decode + Preprocess + Inference)
rtspsrc location=rtsp://XX.XX.XX.XX:554/yourcam1 ! decodebin3 ! videorate ! video/x-raw(memory:VAMemory),framerate=15/1 ! vapostproc ! video/x-raw(memory:VAMemory),width=1920,height=1080 ! gvadetect device=GPU model-instance-id=detect1 inference-interval=1 model=/home/models/object_detection/ITS_CL_FP16/openvino.xml pre-process-backend=va-surface-sharing ! autovideosink=true
In this optimized version, the entire RTSP stream decode, preprocess, and inference, runs on the GPU, minimizing latency and freeing your CPU almost entirely.
Key DL Streamer Components That Make It Work (RTSP Feed in Focus)
Each DL Streamer element plays a unique role in enabling smooth, hardware-accelerated RTSP video analytics.
The Result: A Truly Accelerated RTSP Pipeline
By moving decoding, preprocessing, and inference fully onto the iGPU, you minimize CPU overhead, reduce power consumption, and achieve true real-time performance, even with multiple camera streams.
Why It Matters for Edge and Transportation Use Cases
At the edge, where systems operate in harsh, power-and temperature constrained environments, every watt, millisecond, and CPU cycle counts. Whether it’s a roadside cabinet, transit hub, or industrial site, compute resources are limited, and reliability is non-negotiable. DL Streamer enables you to fully leverage Intel’s integrated GPU for decoding, preprocessing, and inference directly on device, eliminating unnecessary CPU load and memory transfers. The result is higher power efficiency, lower thermal stress, and real-time performance without costly discrete GPU or cloud dependency. In short DL Streamer helps bring scalable, efficient, and resilient AI to where it matters most – the edge.
Why You Should Care
- Unlock iGPU performance with minimal effort
- Simplify development with DL Streamer’s modular, plug-and-play design
- Reduce CPU load and power usage dramatically
- Scale video analytics across multiple streams seamlessly
DL Streamer bridges the gap between your vison data and real-world hardware-optimized AI performance.
Learn More
- ISB Video & AI Cities – Metro AI Suite
- DL Streamer Getting Started Guide
- Deep Learning Streamer Introduction
1Attribution Facts and specifications referenced from the official Intel® DL Streamer GitHub repository and documentation, including release notes and README files as of 2025. DL Streamer is optimized for Intel® Core™, Xeon®, Arc™, and Data Center GPU Flex Series devices through integration with OpenVINO™ Toolkit and hardware-accelerated GStreamer components.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.