- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there, I'm working on a person attribute model, which output are multiple labels with each their confidence score. I'm using DeepMAR model with ResNet50 backbone. I have exported the trained model from PTH model to OpenVINO model, run with this pipeline:
```
GST_DEBUG=4 gst-launch-1.0 filesrc location=/home/hungtrieu07/dev/video_data/1740731094984.mp4 ! decodebin ! videoconvert ! \
gvadetect model=ov_exported_model/person_vehicle/FP16.xml device=CPU ! \
gvawatermark ! \
gvaclassify model=ov_exported_model/person-attr/model.xml model-proc=ov_exported_model/person-attr/model-proc.json device=CPU ! \
gvametaconvert format=json ! gvawatermark ! videoconvert ! autovideosink sync=false
```
model-proc file of classification model:
```
{
"json_schema_version": "2.2.0",
"input_preproc": [
{
"layer_name": "input",
"preprocessor": "opencv",
"format": "image",
"resize": "fixed-size",
"width": 224,
"height": 224,
"mean": [123.675, 116.28, 103.53],
"std": [58.395, 57.12, 57.375],
"reverse_channels": true
}
],
"output_postproc": [
{
"converter": "tensor_to_label",
"output_layers": ["output"],
"threshold": 0.5,
"labels": [
"accessoryHat", "hairLong", "hairShort", "upperBodyShortSleeve", "upperBodyBlack",
"upperBodyBlue", "upperBodyBrown", "upperBodyGreen", "upperBodyGrey", "upperBodyOrange",
"upperBodyPink", "upperBodyPurple", "upperBodyRed", "upperBodyWhite", "upperBodyYellow",
"upperBodyLongSleeve", "lowerBodyShorts", "lowerBodyShortSkirt", "lowerBodyBlack",
"lowerBodyBlue", "lowerBodyBrown", "lowerBodyGreen", "lowerBodyGrey", "lowerBodyOrange",
"lowerBodyPink", "lowerBodyPurple", "lowerBodyRed", "lowerBodyWhite", "lowerBodyYellow",
"lowerBodyLongSkirt", "footwearLeatherShoes", "footwearSandals", "footwearShoes",
"footwearSneaker", "carryingBackpack", "carryingMessengerBag", "carryingLuggageCase",
"carryingSuitcase", "personalLess30", "personalLess45", "personalLess60",
"personalLarger60", "personalLess15", "personalMale", "personalFemale"
]
}
]
}
```
My problem is, the above pipeline only return 1 label, where I need it to return multiple labels. I can find it when use `GST_DEBUG=4`
```
0:00:09.274745034 305321 0x72ab04004030 INFO jsonconverter jsonconverter.cpp:347:to_json:<gvametaconvert0> JSON message: {"objects":[{"classification_layer_name:output":{"confidence":1.0,"label":"hairShort","label_id":2,"model":{"name":"main_graph"}},"detection":{"bounding_box":{"x_max":0.9985297828693653,"x_min":0.9103134767757872,"y_max":0.858617223182577,"y_min":0.3948156415643638},"confidence":0.6648092865943909,"label":"person","label_id":3},"h":501,"region_id":562,"roi_type":"person","w":169,"x":1748,"y":426}],"resolution":{"height":1080,"width":1920},"timestamp":3867000000}
```
I was tested on a python script with the same above pipeline and get some good results:
```
import cv2
import os
import numpy as np
from ultralytics import YOLO
from openvino.runtime import Core
# -----------------------------
# Load Models
# -----------------------------
# 1. Detection model
detection_model = YOLO("best.pt", task="detect")
# 2. DeepMAR attribute model (converted to OpenVINO)
ie = Core()
# Replace with your DeepMAR model’s XML and BIN files
attr_model_xml = "ov_exported_model/person-attr/model.xml"
attr_model_bin = "ov_exported_model/person-attr/model.bin"
attr_model = ie.read_model(model=attr_model_xml, weights=attr_model_bin)
compiled_attr_model = ie.compile_model(model=attr_model, device_name="CPU")
input_layer_attr = compiled_attr_model.inputs[0]
output_layer_attr = compiled_attr_model.outputs[0]
# List of attribute names from your training dataset.
att_list = [
"accessoryHat",
"hairLong",
"hairShort",
"upperBodyShortSleeve",
"upperBodyBlack",
"upperBodyBlue",
"upperBodyBrown",
"upperBodyGreen",
"upperBodyGrey",
"upperBodyOrange",
"upperBodyPink",
"upperBodyPurple",
"upperBodyRed",
"upperBodyWhite",
"upperBodyYellow",
"upperBodyLongSleeve",
"lowerBodyShorts",
"lowerBodyShortSkirt",
"lowerBodyBlack",
"lowerBodyBlue",
"lowerBodyBrown",
"lowerBodyGreen",
"lowerBodyGrey",
"lowerBodyOrange",
"lowerBodyPink",
"lowerBodyPurple",
"lowerBodyRed",
"lowerBodyWhite",
"lowerBodyYellow",
"lowerBodyLongSkirt",
"footwearLeatherShoes",
"footwearSandals",
"footwearShoes",
"footwearSneaker",
"carryingBackpack",
"carryingMessengerBag",
"carryingLuggageCase",
"carryingSuitcase",
"personalLess30",
"personalLess45",
"personalLess60",
"personalLarger60",
"personalLess15",
"personalMale",
"personalFemale",
]
# -----------------------------
# Video Input Setup & Output Directory
# -----------------------------
# video_path = "/home/hungtrieu07/dev/video_data/1740730604383.mp4" # or 0 for webcam
# video_path = "/home/hungtrieu07/dev/video_data/1740730890898.mp4"
# video_path = "/home/hungtrieu07/dev/video_data/1740730950413.mp4"
video_path = "/home/hungtrieu07/dev/video_data/1740731094984.mp4"
# video_path = "/home/hungtrieu07/dev/video_data/1740731190026.mp4"
# video_path = "/home/hungtrieu07/dev/video_data/test_video.avi"
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print("Error: Could not open video.")
exit()
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS)) or 30
# Create an output directory if saving frames/images
output_dir = "output"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
frame_count = 0
# -----------------------------
# Preprocessing Settings for DeepMAR
# -----------------------------
# These are DeepMAR’s normalization parameters.
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
# -----------------------------
# Main Loop
# -----------------------------
while True:
ret, frame = cap.read()
if not ret:
print("Video processing complete.")
break
frame_count += 1
# Run YOLO detection on the frame
detection_results = detection_model(frame)
# Base annotated frame for display
# annotated_frame = detection_results[0].plot()
# Process each detection result
for result in detection_results:
for box in result.boxes:
# Get bounding box coordinates (x1, y1, x2, y2)
x1, y1, x2, y2 = map(int, box.xyxy[0])
# Check if the detection is a person (adjust if necessary)
if int(box.cls) == 3:
# Crop the detected person from the original frame
person_crop = frame[y1:y2, x1:x2]
if person_crop.size == 0:
continue # Skip empty crops
# Resize the cropped image to 224x224 (width x height)
person_crop_resized = cv2.resize(person_crop, (224, 224))
# Preprocess for DeepMAR:
# Convert from BGR (OpenCV) to RGB
person_rgb = cv2.cvtColor(person_crop_resized, cv2.COLOR_BGR2RGB)
# Convert to float and scale to [0, 1]
person_float = person_rgb.astype(np.float32) / 255.0
# Normalize using DeepMAR’s mean and std
person_norm = (person_float - mean) / std
# Rearrange from HWC to CHW format
input_tensor = np.transpose(person_norm, (2, 0, 1))
# Add batch dimension: (1, C, H, W)
input_tensor = np.expand_dims(input_tensor, axis=0)
# Run attribute inference with OpenVINO
attr_probs = compiled_attr_model([input_tensor])[output_layer_attr][0]
# Convert raw logits to probabilities using the sigmoid function
# Build a text string for attributes (here, listing all with score >= 0)
attr_texts = []
for idx, score in enumerate(attr_probs):
if score >= 0.5: # Adjust threshold if needed
attr_texts.append(f"{att_list[idx]}: {score:.3f}")
text_to_draw = ", ".join(attr_texts)
# Draw attribute text on the full annotated frame above the person box
cv2.putText(frame, text_to_draw, (x1, y1 - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
# Resize annotated frame back to original dimensions (if needed)
frame = cv2.resize(frame, (frame_width, frame_height))
# Display the frame
cv2.imshow("Inference", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
```
The classification model is trained from this repo: https://github.com/dangweili/pedestrian-attribute-recognition-pytorch
First, I'm converted from PTH model to ONNX model using this code snippet:
```
import os
import torch
import torch.nn.functional as F
import onnx
from torchinfo import summary
from baseline.model.DeepMAR import DeepMAR_ResNet50 # Adjust import based on your project structure
# Subclass for ONNX export with fixed pooling and sigmoid activation
class DeepMAR_ResNet50_Export(DeepMAR_ResNet50):
def forward(self, x):
x = self.base(x)
# Fixed kernel size for 224x224 input (feature map is 7x7 after ResNet-50)
x = F.avg_pool2d(x, (7, 7))
x = x.view(x.size(0), -1)
if self.drop_pool5:
x = F.dropout(x, p=self.drop_pool5_rate, training=self.training)
x = self.classifier(x)
x = torch.sigmoid(x) # Sigmoid for multi-label confidence scores
return x
def export_deepmar_to_onnx(model_path, onnx_output_path, num_att):
"""
Export DeepMAR ResNet-50 model to ONNX format.
Args:
model_path (str): Path to PyTorch model checkpoint.
onnx_output_path (str): Path to save ONNX model.
num_att (int): Number of attributes (output classes).
"""
# Instantiate export-friendly model
model = DeepMAR_ResNet50_Export(num_att=num_att, last_conv_stride=2)
model = model.cuda()
model.eval()
# Load checkpoint
checkpoint = torch.load(model_path, map_location='cpu')
state_dict = checkpoint.get("state_dicts", checkpoint)
if isinstance(state_dict, list):
state_dict = state_dict[0]
state_dict = {k.replace("module.", ""): v for k, v in state_dict.items()}
model.load_state_dict(state_dict, strict=False)
# Dummy input (batch_size=1, channels=3, height=224, width=224)
dummy_input = torch.randn(1, 3, 224, 224).cuda()
with torch.no_grad():
output = model(dummy_input)
print(f"Dummy input test output shape: {output.shape}")
print(f"Sample output (with sigmoid): {output[0][:5]}")
# Print model summary
print("Model Summary:")
summary(model, input_size=(1, 3, 224, 224))
# Export to ONNX
os.makedirs(os.path.dirname(onnx_output_path), exist_ok=True)
print(f"Exporting to ONNX: {onnx_output_path}...")
torch.onnx.export(
model,
dummy_input,
onnx_output_path,
input_names=["input"],
output_names=["output"],
opset_version=11
)
print(f"Model exported to {onnx_output_path}")
# Verify ONNX model
print("Verifying ONNX model...")
onnx_model = onnx.load(onnx_output_path)
onnx.checker.check_model(onnx_model)
print("ONNX model verification completed!")
if __name__ == "__main__":
model_path = "exp/deepmar_resnet50/peta/partition0/run1/model/ckpt_epoch150.pth" # Update as needed
onnx_output_path = "onnx_models/deepmar.onnx"
num_att = 45 # Adjust based on your training configuration
export_deepmar_to_onnx(model_path, onnx_output_path, num_att)
```
After that, I'm used `ovc` command to convert ONNX model to OpenVINO model:
`ovc onnx_models/deepmar.onnx --output_model ov_exported_model/person-attr/model.xml --input "input[1,3,224,224]" --compress_to_fp16=false`
I can provide you the original PTH model, the ONNX model and the converted OpenVINO model in this link:
https://drive.google.com/drive/folders/16ZavCBXZV1_Q0eUMNkANzMSTRFWXPQx7?usp=sharing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hungtrieu07,
Thanks for your patience.
We have received feedback from relevant team.
You may edit the model-proc.json file and edit/add the option "method": "compound" in the output-postproc section.
Please also refer to the following resources to get an idea on how to create the corresponding model-proc for their model.
- https://github.com/dlstreamer/dlstreamer/blob/v2025.0.1.2/samples/gstreamer/model_proc/intel/person-attributes-recognition-crossroad-0230.json
- https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/person-attributes-recognition-crossroad-0230#outputs
Image attached in this reply looks more closely similar to the expected result, however, there still may need some rework to be done on the model-proc.json file.
Regards,
Wan
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hungtrieu07,
Thank you for reaching out to us.
To assist you more effectively with your issue, we would need some additional information. Could you please provide the following details?
- Version of OpenVINO™ toolkit
- Host Operating System
- Is it a C++ or Python application?
- Are you using DL streamer for this application, if yes, which version of DL streamer are you using?
If you have additional details that might help with our investigation, please feel free to share them here so we can replicate the issue from our end.
Best regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Wan_Intel, I will provide you some information on my computer:
- I'm using the newest version of OpenVINO and DLStreamer (openvino-2025.0.0 and intel-dlstreamer/unknown,now 2025.0.1.1).
- Host OS: Ubuntu 24.04.2 LTS, Kernel: Linux potato 6.11.0-19-generic #19~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Feb 17 11:51:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
- I'm just tested my exported model with command `gst-launch-1.0`. In the future, I will use them in Python application.
I have provided the code for model export, following this pipeline: PTH model ---> ONNX model ---> OpenVINO model, with the above code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hungtrieu07,
Thank you for providing the information with us.
We are investigating the issue, and we will get back to you as soon as possible.
Best regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hungtrieu07,
Could you please share the FP16.xml file with us to further investigate the issue?
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Wan_Intel, this is my FP16.xml file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hungtrieu07,
Could you please share the FP16.bin file with us as well to further investigate the issue?
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have uploaded the FP16.bin file on this Google Drive link: https://drive.google.com/file/d/1kAmCEZADR6KiSz70jWenu77NXryQkU3q/view?usp=sharing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi hungtrieu07,
I've tried to replicate the issue however I encountered the following error:
ERROR: from element /GstPipeline:pipeline0/GstGvaDetect:gvadetect0: base_inference based element initialization has been failed.
Here are the files located in the directory ~/intel/dlstreamer_gst
1740731094984.mp4 DLS_install_prerequisites.sh FP16.bin FP16.xml model.bin model.xml model_proc.json wget-log wget-log.1
Could you please provide the json file with us, and share the steps to re-create the error that you faced when running the provided model from google drive on the DL streamer? You may screenshot of the replicated result here.
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me clear this issue:
I have this 2 links,
This is the person and vehicles detection model, trained with YOLOv8s: https://drive.google.com/drive/folders/16PM7zDo5tlvjIJ-KMmfiZXlWHhMj2kMs?usp=sharing
This is the person attribute recognition model and a model-proc file: https://drive.google.com/drive/folders/1aybyKQzuK0BOm44lCRha259DPGJnsvwi?usp=sharing
Both of them are exported to OpenVINO format.
I'm using this DLStreamer pipeline to test them:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi hungtrieu07,
Thank you for sharing the information with us.
I've replicated the issue on my end and I encountered the same issue as you when running the model with the latest DL Streamer.
I'll escalate this case to relevant team to further investigate the issue and we will get back to you as soon as we have an update. We appreciate your understanding and will keep you updated.
Best regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi hungtrieu07,
Thank you for your patience.
We are still investigating the issue from our end. Just wanted to confirm what is the result that you are expecting. The result that we have generated from our end is shown as follow:
Did the result above meet your expectation? If not, could you please share a correct output result here so that we can further investigate the issue?
On the other hand, is the result that you are expecting look like the picture below?
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Wan_Intell, my desire output is the second output from your reply.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi hungtrieu07,
Thank you for the confirmation.
We will further investigate the issue, and we will provide an update here as soon as possible.
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi hungtrieu07,
Could you please share the Python script with us for replication purpose? The one that you have shared previously will encounter IndentationError: expected an indented block after 'if' statement on line XX
Best regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Wan_Intel, I provide you my github repository link for this model. Inside the repository are some script use for process the dataset, training and testing.
https://github.com/hungtrieu07/pedestrian-attribute-recognition-pytorch
From the original repository, the person attribute recognition model will return this output, is an array with shape [1, 45], it's logits value:
Attribute scores: [ -7.0225 -6.5829 7.191 -7.2637 -2.5245 -0.34509 -6.1217 -9.9974 -7.0711 -6.832 -3.3858 -9.6749 -6.0621 -3.8961 -5.9358 5.8452 -6.9435 -8.6819 -2.9126 -2.2814 -6.0654 -9.8109 -5.4611 -6.4585 -8.7987 -10.281
-6.8376 -3.9942 -7.9514 -9.0474 -3.9968 -4.9537 -10.83 -8.4756 -4.8023 -4.7115 4.5504 -8.0026 -4.7523 2.9141 -7.7332 -5.696 -5.7493 1.4905 -1.5996]
In my forked repository, I was added the sigmoid activation in ONNX export model. I'm also working on the gvainference element, try if the model can be used on this element. If any problem occur, I will provide information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hungtrieu07,
Thanks for your patience.
We have received feedback from relevant team.
You may edit the model-proc.json file and edit/add the option "method": "compound" in the output-postproc section.
Please also refer to the following resources to get an idea on how to create the corresponding model-proc for their model.
- https://github.com/dlstreamer/dlstreamer/blob/v2025.0.1.2/samples/gstreamer/model_proc/intel/person-attributes-recognition-crossroad-0230.json
- https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/person-attributes-recognition-crossroad-0230#outputs
Image attached in this reply looks more closely similar to the expected result, however, there still may need some rework to be done on the model-proc.json file.
Regards,
Wan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Wan_Intel,
Thank you so much for the solution. After add some pre-process on model-proc file, now the output are correctly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi hungtrieu07,
We are glad to know that your issue has been resolved.
We will proceed with closing this case. If you need additional information, please submit a new thread.
Best regards,
Wan

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page