Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Multi Stage detectors data map

shekarneo
Beginner
772 Views

I have the dlstreamer pipeline below, and I am using person detection and face detection models. I am attempting to associate person bounding boxes with their corresponding faces. However, dlstreamer provides separate lists for persons and faces, making it challenging to determine which face belongs to each person.

Could you please assist me in figuring out how to match the faces with their respective persons?

Below, you can find the pipeline and the complete code.

 

 

filesrc location=t1.jpg ! decodebin \
                                ! videoscale ! video/x-raw,width=640,height=640 \
                                ! videoconvert ! capsfilter caps=video/x-raw,format=BGR \
                                ! queue ! gvadetect model=../person_head_detection/models/yolov5_80cls/yolov5s_openvino_model/FP32/yolov5s.xml model-proc=../person_head_detection/models/model_proc/yolo-v5_80.json device=CPU threshold=0.8 name=gvadetect_yolov5 \
                                ! queue ! gvatrack tracking-type=short-term-imageless \
                                ! queue ! gvadetect model=models/face-detection-0204/FP32/face-detection-0204.xml model-proc=models/model_proc/face-detection-0204.json device=CPU threshold=0.8 inference-region=roi-list name=gvadetect_face \
                                ! queue ! gvawatermark name=gvawatermark \
                                ! queue ! videoconvert n-threads=4 ! fpsdisplaysink sync=false

 

 

 

from collections import defaultdict
from pathlib import Path
import time
import sys
import gi
from datetime import datetime
import logging
import cv2
import numpy as np
import copy
import os

gi.require_version("Gst", "1.0")
gi.require_version("GstApp", "1.0")
gi.require_version("GstVideo", "1.0")
gi.require_version("GObject", "2.0")
from gi.repository import Gst, GLib, GstApp, GstVideo
from gstgva import VideoFrame, util
import uuid
from datetime import datetime
from datetime import timedelta

Gst.init(sys.argv)

def bus_call(bus, message, loop):
    t = message.type
    if t == Gst.MessageType.EOS:
        sys.stdout.write("End-of-stream\n")
        loop.quit()
    elif t==Gst.MessageType.WARNING:
        err, debug = message.parse_warning()
        sys.stderr.write("Warning: %s: %s\n" % (err, debug))
    elif t == Gst.MessageType.ERROR:
        err, debug = message.parse_error()
        sys.stderr.write("Error: %s: %s\n" % (err, debug))
        loop.quit()
    return True


class Pipeline:
    def __init__(self):
        self.pipeline_string = f"filesrc location=t1.jpg ! decodebin \
                                ! videoscale ! video/x-raw,width=640,height=640 \
                                ! videoconvert ! capsfilter caps=video/x-raw,format=BGR \
                                ! queue ! gvadetect model=../person_head_detection/models/yolov5_80cls/yolov5s_openvino_model/FP32/yolov5s.xml model-proc=../person_head_detection/models/model_proc/yolo-v5_80.json device=CPU threshold=0.8 name=gvadetect_yolov5 \
                                ! queue ! gvatrack tracking-type=short-term-imageless \
                                ! queue ! gvadetect model=models/face-detection-0204/FP32/face-detection-0204.xml model-proc=models/model_proc/face-detection-0204.json device=CPU threshold=0.8 inference-region=roi-list name=gvadetect_face \
                                ! queue ! gvawatermark name=gvawatermark \
                                ! queue ! videoconvert n-threads=4 ! fpsdisplaysink sync=false"
        self.detected_frames = {}
        self.frame_number=0
        self.save_dir = "crop_data"
        #                                ! queue ! gvametaconvert ! gvametapublish file-format=json-lines file-path=output.json \


    def frame_callback(self, frame):
        detected_ids = []
        self.frame_number += 1
        ts = frame._VideoFrame__buffer.pts
        with frame.data() as img:
            for roi in frame.regions():
                # print(roi.meta().parent_id)
                # print(dir(roi))
                # print(dir(roi.meta()))
                print(f"roi: {roi.label()}-{roi.object_id()}")

                if roi.label() == "person":
                    bbox = roi.rect()
                    obj_id = roi.object_id()
                    p_conf = roi.confidence()
                if roi.label() == "face":
                    # print(roi.label())
                    # print(roi.confidence())
                    # print(bbox)
                    # print(roi.rect())
                    f_conf = roi.confidence()
                    x,y,w,h = bbox
                    x1 = int(x)
                    x2 = int(x) + int(w)
                    y2 = int(y) + int(h)
                    y1 = int(y)
                    if h > 2*w:
                        crop = img[y:y+h, x:x+w]
                        if not os.path.exists(f"{self.save_dir}/{obj_id}"):
                            os.makedirs(f"{self.save_dir}/{obj_id}")
                        cv2.imwrite(f"{self.save_dir}/{obj_id}/person_{self.frame_number}_{p_conf}_{f_conf}.jpg", crop)

                # if roi.label_id() == 0:
                #     for tensor in roi.tensors():
                #         if 'reid_embedding' in tensor.name():
                #             person_descriptor = tensor.data().tolist()
                #             print(len(person_descriptor))
                #             cv2.imwrite("test.jpg", img)

                # roi.add_tensor("confidence").set_label(f"{roi.confidence():.2f}")
        return True


    def detect_pad_probe_callback(self, pad, info):
        with util.GST_PAD_PROBE_INFO_BUFFER(info) as buffer:
            caps = pad.get_current_caps()
            frame = VideoFrame(buffer, caps=caps)
            image_width = frame.video_info().width
            image_height = frame.video_info().height
            self.frame_callback(frame)
        return Gst.PadProbeReturn.OK
    
    def watermark_pad_probe_callback(self, pad, info):
        with util.GST_PAD_PROBE_INFO_BUFFER(info) as buffer:
            caps = pad.get_current_caps()
            frame = VideoFrame(buffer, caps=caps)
            image_width = frame.video_info().width
            image_height = frame.video_info().height
            self.frame_callback(frame)
        return Gst.PadProbeReturn.OK
    
    # def on_message(self, bus: Gst.Bus, message: Gst.Message, loop: GLib.MainLoop):
    #     mtype = message.type

    #     if mtype == Gst.MessageType.EOS:
    #         self.cv_plot()
    #         self.pipeline.set_state(Gst.State.NULL)


    def run(self):
        pipeline = Gst.parse_launch(self.pipeline_string)
        loop = GLib.MainLoop()
        bus = pipeline.get_bus()
        bus.add_signal_watch()
        bus.connect("message", bus_call, loop)

        # detect_pad = pipeline.get_by_name("gvainference")
        # if detect_pad:
        #     pad = detect_pad.get_static_pad("src")
        #     pad.add_probe(Gst.PadProbeType.BUFFER, self.detect_pad_probe_callback)

        gvawatermark = pipeline.get_by_name("gvawatermark")
        if gvawatermark:
            pad = gvawatermark.get_static_pad("src")
            pad.add_probe(Gst.PadProbeType.BUFFER, self.watermark_pad_probe_callback)
        # graypad = pipeline.get_by_name("gray")
        # if graypad:
        #     pad = graypad.get_static_pad("src")
        #     pad.add_probe(Gst.PadProbeType.BUFFER, self.pad_probe_callback)


        pipeline.set_state(Gst.State.PLAYING)
        try:
            loop.run()
        except Exception as e:
            logging.error("Exception")
            loop.quit()

        pipeline.set_state(Gst.State.NULL)

obj = Pipeline()
obj.run()

 

 

0 Kudos
5 Replies
Aznie_Intel
Moderator
743 Views

Hi Shekarneo,

 

Thanks for reaching out.

 

You may refer face_detection_and_classification.sh for the pipeline elements used in Face Detection And Classification Sample. The sample output visualizes video with bounding boxes around detected faces, facial landmarks points and text with classification results (age/gender, emotion) for each detected face or prints out fps if you set SINK_ELEMENT = fps

 

Hope this helps.

 

 

Regards,

Aznie


0 Kudos
shekarneo
Beginner
704 Views

Hi Aznie,

 

This will work when we have primary as detector and secondary as classifier.

but the same is not working with both primary and secondary as detectors 

like gvadetect(person detector)->gvatracker->gvadetect(face detector)

0 Kudos
Aznie_Intel
Moderator
674 Views

Hi Shekarneo,

 

It is expected since Face Detection And Classification Sample builds GStreamer pipeline of the following elements

 

  • gvadetect for face detection based on OpenVINO™ Toolkit Inference Engine
  • gvaclassify inserted into pipeline three times for face classification on three DL models (age-gender, emotion, landmark points)
  • gvawatermark for bounding boxes and labels visualization

 

The gvatrack element is used in Vehicle and Pedestrian Tracking Sample for object tracking capabilities.

 

 

Regards,

Aznie


0 Kudos
Aznie_Intel
Moderator
606 Views


Hi Shekarneo,


This thread will no longer be monitored since we have provided information. If you need any additional information from Intel, please submit a new question. 



Regards,

Aznie


0 Kudos
PrincePatel
Beginner
297 Views

@Aznie_Intel Facing the same issue here : https://github.com/dlstreamer/dlstreamer/issues/430

The dlstreamer has resolved this in 2022.1 release pointed here: https://github.com/dlstreamer/dlstreamer/issues/206#issuecomment-1188346806 

 

But it is an issue now. Can we resolve this? 

0 Kudos
Reply