Re: PostProcessing On GPU

Gayathri_Sankaran · ‎02-11-2025

Hi All,

I am currently using openvino remote tensor for doing the inference of the rgb buffer data passed through directx texture in windows application. The model is compiled on GPU and inference is also done by making use of GPU. The input buffer is also maintained on the GPU. The pipeline is executed entirely on the GPU till inference.

The major computation task is postprocessing of the model output tensor and currently it is been carried out on CPU. The mode used is custom yolox nano with output shape [1 3549 14]

Done some research for moving the post-processing to move in GPU, found MulticlassNMS class in openvino can be used to do the post-processing at the model level.

The information for this is not obvious currently about the usage and passing parameters for these class. It would be helpful if additional information is shared for this case.

Is there another way to move the postprocessing on the GPU?

Thanks in Advance !

Aznie_Intel · ‎02-12-2025

Hi Gayathri_Sankaran,

Thanks for reaching out. We are checking this with the relevant team and will update you once the information is available.

Regards,

Aznie

Aznie_Intel · ‎02-14-2025

Hi Gayathri_Sankaran,

Offloading post-processing tasks such as Non-Maximum Suppression (NMS) from the CPU to the GPU in OpenVINO can be challenging, as the MulticlassNMS operation is primarily optimized for CPU execution. However, you can explore potential optimizations within the pipeline to leverage GPU capabilities for certain post-processing steps, though we cannot guarantee successful results.

One approach is to use OpenVINO’s Pre/Post-Processing API, which allows you to define pre- and post-processing steps directly in the model. While this API is mainly optimized for CPU execution, it can help streamline the pipeline. Here's an example:

from openvino.runtime import Core

from openvino.preprocess import PrePostProcessor

core = Core()

model = core.read_model("yolox_nano.xml")

ppp = PrePostProcessor(model)

# Define preprocessing (if needed)

# ppp.input().tensor().set_layout("NCHW").set_color_format(...)

# Define post-processing (e.g., MulticlassNMS)

ppp.output().postprocess().multiclass_nms(score_threshold=0.5, iou_threshold=0.5)

model = ppp.build()

For more information, please refer to the official documentation.

Class ov::op::v9::MulticlassNms — OpenVINO™ documentationBack ButtonFilter Button — Version(2025)

Regards,

Aznie

Gayathri_Sankaran · ‎02-19-2025

Hi Aznie

Thank you for the guidance.

Currently I am working with the decoded output model structure. Customized Yolo-x nano with the output shape [ 1 3549 14] trained for the 9 classes. In this case the objectness_score and class_score are present at 4 and 5 th index of the the output tensor.

My question is does this postprocessing works in this scenario too?

Thanks in advance.

Aznie_Intel · ‎02-25-2025

Hi Gayathri_Sankaran,

Yes, the MulticlassNMS post-processing method can be applied to your model with the output shape [1, 3549, 14], as long as you correctly reference the objectness and class scores (located at indices 4 and 5), and appropriately handle the bounding box coordinates. This approach will allow you to filter out low-confidence predictions and use NMS to eliminate redundant detections.

I hope this helps!

Regards,

Aznie

Aznie_Intel · ‎03-06-2025

Hi Gayathri_Sankaran,

This thread will no longer be monitored since we have provided a solution. If you need any additional information from Intel, please submit a new question.

Regards,

Aznie