Accelerating Media Analytics with Intel® DL Streamer featuring OpenVINO with Intel® RealSense

Stephanie_Maluso · ‎03-29-2022

Accelerating Media Analytics with Intel® DL Streamer Pipeline Server featuring OpenVINO inference with Intel® RealSense

Author: Pradeep Sakhamoori

Pardeep Sakhamoori is an AI Frameworks and Solutions Architect at Intel Internet of Things group. He is passionate technology enthusiast who enjoys solving and help customer build innovative solutions with Intel SW and HW portfolio. His core areas of focus are IoT, Data Science and Machine Learning.

Introduction:

Intel® Distribution of OpenVINO™

AI inference applies capabilities learned after training a neural network to yield results. The Intel Distribution of OpenVINO toolkit enables you to optimize, tune, and run comprehensive AI inference using the included Model Optimizer and runtime and development tools.

a1280562 openvino-chart-blue-16x9.png ToolKit includes:

A Model Optimizer to convert models from popular frameworks such as Caffe*, TensorFlow*, Open Neural Network Exchange (ONNX*), and Kaldi
An inference engine that supports heterogeneous execution across computer vision accelerators from Intel, including CPUs, GPUs, FPGAs, and the Neural Compute Stick 2 (NCS2)
Common API for heterogeneous Intel® hardware

Intel® Deep Learning Streamer (Intel® DL Streamer):

Intel® DL Streamer is a streaming media analytics framework, based on GStreamer* multimedia framework, for creating complex media analytics pipelines.

Intel® DL Streamer makes media analytics easy:

Write less code and get amazing performance
Quickly develop, optimize, benchmark, and deploy video & audio analytics pipelines in the Cloud and at the Edge
Analyze video and audio streams, create actionable results, capture results, and send them to the cloud
Leverage the efficiency and computational power of Intel hardware platforms

Go to Intel® DL Streamer documentation website for information on how to download, install, and use.

Media analytics involves the analysis of audio & video streams to detect, classify, track, identify and count objects, events, and people. The analyzed results can be used to take actions, coordinate events, identify patterns and gain insights across multiple domains.

Media analytics pipelines transform media streams into insights through audio / video processing, inference, and analytics operations across multiple IP blocks.

Solution overview:

The solution uses Intel® DL Streamer to enable no code experiences while processing media analytics workloads with Intel® Distribution of OpenVINO™ for optimized and accelerated Inference on Intel Architecture, including CPUs, GPUs, VPUs, and FPGAs. In this blog, we learn about how to use Intel OpenSource Tools/SDKs to run Yolov2 inference for object detection and see how to run inference with OpenVINO model zoo models with no code experience. Below are Intel tools/SDKs used, and the reason for using those.

Intel® Distribution of OpenVINO™, this toolkit enables you to optimize, tune, and run comprehensive AI inference using the included Model Optimizer and runtime and development tools
Intel® Deep Learning Streamer, analytics framework for creating and deploying complex media analytics pipelines across Intel® architecture from edge to cloud
Intel® RealSense™ , Technology is a product range of depth and tracking technologies designed to give machines and device depth perception capabilities (developer guide).
Intel® Open Model Zoo, Open source repository of pre-trained models provided by Intel and support for public models (compatible open source models provided by third parties)
Intel® Deep Learning Pipeline Server, is a Python package and microservice for deploying optimized media analytics pipelines GitHub

Architecture:

Architecture Diagram.png

Compilation and Demo Steps:

1. Build DL Streamer Pipeline Server image:

o DL Streamer Pipeline Server
  git clone https://github.com/dlstreamer/pipeline-server

o Update “model_lists.xml” with YoloV2
  - model: yolo-v2-tf
    alias: object_detection
    version: yolo_v2
    precision: [FP16, FP32]

o Add line below at end of “requirments.txt” to install Intel RealSense Python SDK
  pyrealsense2

o Run Docker build with 
  $./docker/build.sh

2. Docker run:

o $./docker/run.sh --dev -v ~/.Xauthority:/root/.Xauthority -v /tmp/.X11-unix/ -e DISPLAY=$DISPLAY

Application code:

Initialization and argument parser

import argparse
import json, os, time
import pyrealsense2 as rs
import numpy as np
from queue import Queue
import cv2 as cv
import gi

gi.require_version('Gst', '1.0')
from gi.repository import Gst
from gstgva.util import gst_buffer_data
from vaserving.gstreamer_app_source import GvaFrameData
from vaserving.vaserving import VAServing

source_dir = os.path.abspath(os.path.join(os.path.dirname(__file__)))

def parse_args(args=None, program_name="App Source and Destination Sample"):

    parser = argparse.ArgumentParser(prog=program_name, fromfile_prefix_chars='@',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)

    parser.add_argument("--uri", action="store",
                        dest="input_uri",
                        required=False,
                        default="file:///home/pipeline-server/samples/classroom.mp4")

    parser.add_argument("--mode", action="store", dest="source_mode",
                        required=False, choices=["pull", "push"], default="pull")

    parser.add_argument("--pipeline", action="store", dest="pipeline",
                        required=False, default="object_detection")

    parser.add_argument("--version", action="store", dest="pipeline_version",
                        required=False, default="yolo_v2")

    parser.add_argument("--parameters", action="store",
                        dest="parameters",
                        required=False,
                        default=None)

    if (isinstance(args, dict)):
        args = ["--{}={}".format(key, value)
                for key, value in args.items() if value]

    return parser.parse_args(args)

Block 1: Reading RealSense RGB frames and initializing Video Analytics Serving pipeline

if __name__ == "__main__":
    args = parse_args()

    # Queue object for input and output frames
    detect_input = Queue()
    detect_output = Queue()

    # Initilaizing RealSense pipeline and configuration object
    pipe = rs.pipeline()
    config = rs.config()
    rgb_frame = None
    rgb_image = None

    # Configuring stream resolution to VGA and frame rate to 30fps
    config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

    # Setting RealSense profile and color alignment object
    profile = pipe.start(config)
    align = rs.align(rs.stream.color)

    # Prep RealSense camera
    rgb_sensor = profile.get_device().query_sensors()[1]
    rgb_sensor.set_option(rs.option.auto_exposure_priority, True)
    frames = pipe.wait_for_frames()

    # Skipping first 10 frames to adjust exposure
    for i in range(10):
    frames = pipe.wait_for_frames()
    aligned_frames = align.process(frames)
    rgbimg = cv.cvtColor(np.array(frames.get_color_frame().get_data()), cv.COLOR_BGR2RGB)

   # Initializing VA serving
    VAServing.start({'log_level': 'DEBUG', "ignore_init_errors":True, 'enable-rtsp':False})
    parameters = None

   # Reading input arguments for VA serving pipeline and model details.
    if args.parameters:
        parameters = json.loads(args.parameters)

    # Start object detection pipeline
    # It will wait until it receives frames via the detect_input queue
    detect_pipeline = VAServing.pipeline(args.pipeline, args.pipeline_version)
    detect_pipeline.start(source={"type": "application",
                                  "class": "GStreamerAppSource",
                                  "input": detect_input,
                                  "mode": args.source_mode},
                                  destination={"type": "application",
                                   "class": "GStreamerAppDestination",
                                   "output": detect_output,
                                   "mode": "frames"}, parameters=parameters)

Block 2: Decoded RealSense frames are passed to Pipeline Server pipeline to perform YoloV2 object detection inferences with OpenVINO.

    sequence_number = 0
    result_count = 0
    end_of_stream = False
    format_ = Gst.Caps.from_string("video/x-raw,format=BGR,width=640,height=480")

    # Decode and VA serving pipeline for OpenVINO Inference
    while (not end_of_stream):
        frames = pipe.wait_for_frames()
        aligned_frames = align.process(frames)

        rgb_frame = aligned_frames.get_color_frame()
        rgb_image = cv.cvtColor(np.array(rgb_frame.get_data()), cv.COLOR_BGR2RGB)

        if len(rgb_image):
            new_sample = GvaFrameData(bytes(rgb_frame.get_data()), \
            format_, message={'sequence_number':sequence_number, 'timestamp':time.time()})
            detect_input.put(new_sample)
            sequence_number += 1
            print("Frame Counter: ", sequence_number, end='\r')
        else:
            detect_input.put(None)

        while (not detect_output.empty()):
            results = detect_output.get()
            if (results):
                result_count += 1
            else:
                end_of_stream = True
                break

            if (results.video_frame):
                regions = list(results.video_frame.regions())
                messages = list(results.video_frame.messages())
                timestamp = json.loads(messages[0])

                print("Frame: sequence_number:{} timestamp:{}".format(timestamp["sequence_number"],
                                                                      timestamp["timestamp"]))
                print(detect_pipeline.status())
                if not regions:
                    print("Nothing detected")

                for region in regions:
                    print("\tDetection: Region = {}, Label = {}".format(region.rect(), region.label()))
                    object_id = region.object_id()
                    if object_id:
                        print("\tTracking: object_id = {}".format(object_id))
                    tensors = list(region.tensors())
                    for tensor in tensors:
                        if not tensor.is_detection():
                            layer_name = tensor["layer_name"]
                            label = tensor["label"]
                            print("\tClassification: {} = {}".format(layer_name, label))
                print()
    print("Received {} results".format(result_count))
    VAServing.stop()

Pipeline: Create a file pipeline.json under (~/home/pipeline-server/pipelines/object_detection/yolo_v2/. in dlstreamer-pipeline-server-gstreamer:latest container)

{
       "type": "GStreamer",
       "template": ["{auto_source} ",
                   " ! gvadetect model={models[object_detection][yolo_v2][network]}
name=detection device=CPU batch-size=1 device=CPU",
                    " ! appsink name=destination"],
          "description": "Object detection yolo-v2-tf",
          "parameters": {
                      "type": "object",
                      "properties": {
                              "detection-model-instance-id": {
                                      "element": {
                                              "name": "detection",
                                              "property": "model-instance-id"
                                      },
                                      "type": "string"
                              }
                      }
           }
}

Edge device Configuration:

Ubuntu 18.04 or higher
Docker
Intel NUC Core i7
RealSense D435 depth camera

Demo and Setup:

Intel NUC with Core I7 and Intel RealSense D435 Depth camera

IMG_3599_Pradeep Sakhamoori.jpg

Starting Pipeline Server with Docker Run:

DockerRun_Pradeep Sakhamoori.png

Starting Object Detection (YoloV2) Application :

RunningApp_Pradeep Sakhamoori.png

Model Definition and Labels:

model_definition_json_with_labels_Pradeep Sakhamoori.png

Pipeline configuration (JSON):

pipeline_json_Pradeep Sakhamoori.png

Inference results (of Intel RealSense RGB input stream):

SampleInferenceLogs_Pradeep Sakhamoori.png

Sample rendered Output (with default configuration rendering is turned off):

Inference Coffee Mug.png

Conclusion:

In this post, we proposed a solution to accelerate and reduce time-to-deploy end to end AI inference pipeline using Intel® DL Streamer Pipeline Server with Intel® OpenVINO Deep learning toolkit for accelerated AI inference on Intel® IoT Edge devices. The solution brings automation of no code OpenVINO inference deployment pipeline with open zoo models and configuration. In this blog, we demonstrated YoloV2 inference on Intel® NUC edge device with Core i7 using Intel® RealSense D435 depth camera.

Notices & Disclaimers

Performance varies by use configuration, and other factors. Learn more at www.Intel.com/PerformanceIndex

No product or component can be absolutely secure.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

Your costs and results may vary.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel technologies may require enabled hardware, software, or service activation.