Multi Stage detectors data map

shekarneo · ‎02-28-2024

I have the dlstreamer pipeline below, and I am using person detection and face detection models. I am attempting to associate person bounding boxes with their corresponding faces. However, dlstreamer provides separate lists for persons and faces, making it challenging to determine which face belongs to each person.

Could you please assist me in figuring out how to match the faces with their respective persons?

Below, you can find the pipeline and the complete code.

filesrc location=t1.jpg ! decodebin \
                                ! videoscale ! video/x-raw,width=640,height=640 \
                                ! videoconvert ! capsfilter caps=video/x-raw,format=BGR \
                                ! queue ! gvadetect model=../person_head_detection/models/yolov5_80cls/yolov5s_openvino_model/FP32/yolov5s.xml model-proc=../person_head_detection/models/model_proc/yolo-v5_80.json device=CPU threshold=0.8 name=gvadetect_yolov5 \
                                ! queue ! gvatrack tracking-type=short-term-imageless \
                                ! queue ! gvadetect model=models/face-detection-0204/FP32/face-detection-0204.xml model-proc=models/model_proc/face-detection-0204.json device=CPU threshold=0.8 inference-region=roi-list name=gvadetect_face \
                                ! queue ! gvawatermark name=gvawatermark \
                                ! queue ! videoconvert n-threads=4 ! fpsdisplaysink sync=false

from collections import defaultdict
from pathlib import Path
import time
import sys
import gi
from datetime import datetime
import logging
import cv2
import numpy as np
import copy
import os

gi.require_version("Gst", "1.0")
gi.require_version("GstApp", "1.0")
gi.require_version("GstVideo", "1.0")
gi.require_version("GObject", "2.0")
from gi.repository import Gst, GLib, GstApp, GstVideo
from gstgva import VideoFrame, util
import uuid
from datetime import datetime
from datetime import timedelta

Gst.init(sys.argv)

def bus_call(bus, message, loop):
    t = message.type
    if t == Gst.MessageType.EOS:
        sys.stdout.write("End-of-stream\n")
        loop.quit()
    elif t==Gst.MessageType.WARNING:
        err, debug = message.parse_warning()
        sys.stderr.write("Warning: %s: %s\n" % (err, debug))
    elif t == Gst.MessageType.ERROR:
        err, debug = message.parse_error()
        sys.stderr.write("Error: %s: %s\n" % (err, debug))
        loop.quit()
    return True


class Pipeline:
    def __init__(self):
        self.pipeline_string = f"filesrc location=t1.jpg ! decodebin \
                                ! videoscale ! video/x-raw,width=640,height=640 \
                                ! videoconvert ! capsfilter caps=video/x-raw,format=BGR \
                                ! queue ! gvadetect model=../person_head_detection/models/yolov5_80cls/yolov5s_openvino_model/FP32/yolov5s.xml model-proc=../person_head_detection/models/model_proc/yolo-v5_80.json device=CPU threshold=0.8 name=gvadetect_yolov5 \
                                ! queue ! gvatrack tracking-type=short-term-imageless \
                                ! queue ! gvadetect model=models/face-detection-0204/FP32/face-detection-0204.xml model-proc=models/model_proc/face-detection-0204.json device=CPU threshold=0.8 inference-region=roi-list name=gvadetect_face \
                                ! queue ! gvawatermark name=gvawatermark \
                                ! queue ! videoconvert n-threads=4 ! fpsdisplaysink sync=false"
        self.detected_frames = {}
        self.frame_number=0
        self.save_dir = "crop_data"
        #                                ! queue ! gvametaconvert ! gvametapublish file-format=json-lines file-path=output.json \


    def frame_callback(self, frame):
        detected_ids = []
        self.frame_number += 1
        ts = frame._VideoFrame__buffer.pts
        with frame.data() as img:
            for roi in frame.regions():
                # print(roi.meta().parent_id)
                # print(dir(roi))
                # print(dir(roi.meta()))
                print(f"roi: {roi.label()}-{roi.object_id()}")

                if roi.label() == "person":
                    bbox = roi.rect()
                    obj_id = roi.object_id()
                    p_conf = roi.confidence()
                if roi.label() == "face":
                    # print(roi.label())
                    # print(roi.confidence())
                    # print(bbox)
                    # print(roi.rect())
                    f_conf = roi.confidence()
                    x,y,w,h = bbox
                    x1 = int(x)
                    x2 = int(x) + int(w)
                    y2 = int(y) + int(h)
                    y1 = int(y)
                    if h > 2*w:
                        crop = img[y:y+h, x:x+w]
                        if not os.path.exists(f"{self.save_dir}/{obj_id}"):
                            os.makedirs(f"{self.save_dir}/{obj_id}")
                        cv2.imwrite(f"{self.save_dir}/{obj_id}/person_{self.frame_number}_{p_conf}_{f_conf}.jpg", crop)

                # if roi.label_id() == 0:
                #     for tensor in roi.tensors():
                #         if 'reid_embedding' in tensor.name():
                #             person_descriptor = tensor.data().tolist()
                #             print(len(person_descriptor))
                #             cv2.imwrite("test.jpg", img)

                # roi.add_tensor("confidence").set_label(f"{roi.confidence():.2f}")
        return True


    def detect_pad_probe_callback(self, pad, info):
        with util.GST_PAD_PROBE_INFO_BUFFER(info) as buffer:
            caps = pad.get_current_caps()
            frame = VideoFrame(buffer, caps=caps)
            image_width = frame.video_info().width
            image_height = frame.video_info().height
            self.frame_callback(frame)
        return Gst.PadProbeReturn.OK
    
    def watermark_pad_probe_callback(self, pad, info):
        with util.GST_PAD_PROBE_INFO_BUFFER(info) as buffer:
            caps = pad.get_current_caps()
            frame = VideoFrame(buffer, caps=caps)
            image_width = frame.video_info().width
            image_height = frame.video_info().height
            self.frame_callback(frame)
        return Gst.PadProbeReturn.OK
    
    # def on_message(self, bus: Gst.Bus, message: Gst.Message, loop: GLib.MainLoop):
    #     mtype = message.type

    #     if mtype == Gst.MessageType.EOS:
    #         self.cv_plot()
    #         self.pipeline.set_state(Gst.State.NULL)


    def run(self):
        pipeline = Gst.parse_launch(self.pipeline_string)
        loop = GLib.MainLoop()
        bus = pipeline.get_bus()
        bus.add_signal_watch()
        bus.connect("message", bus_call, loop)

        # detect_pad = pipeline.get_by_name("gvainference")
        # if detect_pad:
        #     pad = detect_pad.get_static_pad("src")
        #     pad.add_probe(Gst.PadProbeType.BUFFER, self.detect_pad_probe_callback)

        gvawatermark = pipeline.get_by_name("gvawatermark")
        if gvawatermark:
            pad = gvawatermark.get_static_pad("src")
            pad.add_probe(Gst.PadProbeType.BUFFER, self.watermark_pad_probe_callback)
        # graypad = pipeline.get_by_name("gray")
        # if graypad:
        #     pad = graypad.get_static_pad("src")
        #     pad.add_probe(Gst.PadProbeType.BUFFER, self.pad_probe_callback)


        pipeline.set_state(Gst.State.PLAYING)
        try:
            loop.run()
        except Exception as e:
            logging.error("Exception")
            loop.quit()

        pipeline.set_state(Gst.State.NULL)

obj = Pipeline()
obj.run()

Aznie_Intel · ‎02-29-2024

Hi Shekarneo,

Thanks for reaching out.

You may refer face_detection_and_classification.sh for the pipeline elements used in Face Detection And Classification Sample. The sample output visualizes video with bounding boxes around detected faces, facial landmarks points and text with classification results (age/gender, emotion) for each detected face or prints out fps if you set SINK_ELEMENT = fps

Hope this helps.

Regards,

Aznie

shekarneo · ‎03-03-2024

Hi Aznie,

This will work when we have primary as detector and secondary as classifier.

but the same is not working with both primary and secondary as detectors

like gvadetect(person detector)->gvatracker->gvadetect(face detector)

Aznie_Intel · ‎03-05-2024

Hi Shekarneo,

It is expected since Face Detection And Classification Sample builds GStreamer pipeline of the following elements

gvadetect for face detection based on OpenVINO™ Toolkit Inference Engine
gvaclassify inserted into pipeline three times for face classification on three DL models (age-gender, emotion, landmark points)
gvawatermark for bounding boxes and labels visualization

The gvatrack element is used in Vehicle and Pedestrian Tracking Sample for object tracking capabilities.

Regards,

Aznie

Aznie_Intel · ‎03-12-2024

Hi Shekarneo,

This thread will no longer be monitored since we have provided information. If you need any additional information from Intel, please submit a new question.

Regards,

Aznie

PrincePatel · ‎08-27-2024

@Aznie_Intel Facing the same issue here : https://github.com/dlstreamer/dlstreamer/issues/430

The dlstreamer has resolved this in 2022.1 release pointed here: https://github.com/dlstreamer/dlstreamer/issues/206#issuecomment-1188346806

But it is an issue now. Can we resolve this?