Passing Vector in CPU memory to GPU tensor Input for MulticlassNMS in Openvino

Gayathri_Sankaran · ‎03-28-2025

hello,

I am working with Openvino and have a model compiled on the GPU for the MulticlassNMS. I need to use this for getting values on my object detection model outputs.

decode_yolox_output_nms(output_buffer, NUM_GRID_CELLS, NUM_CLASSES, boxes, scores, 1.0f, IMAGE_HEIGHT, IMAGE_WIDTH);

This is my CPU tensor data :

nms_attrs.iou_threshold = config.iou_threshold;
nms_attrs.score_threshold = config.score_threshold;
nms_attrs.sort_result_type = ov::op::v8::MulticlassNms::SortResultType::SCORE;

auto box_node = std::make_shared<ov::op::v0::Parameter>(ov::element::f32, ov::Shape{ 1, NUM_GRID_CELLS, 4 });
auto score_node = std::make_shared<ov::op::v0::Parameter>(ov::element::f32, ov::Shape{ 1, NUM_CLASSES, NUM_GRID_CELLS });

auto nms = std::make_shared<ov::op::v8::MulticlassNms>(box_node, score_node, nms_attrs);

nms_model = std::make_shared<ov::Model>(ov::OutputVector{ nms }, ov::ParameterVector{ box_node, score_node }, "NMS_Model");
// compiled_model_nms = core.compile_model(nms_model, "GPU");
compiled_model_nms = core.compile_model(nms_model, *gpu_context);

box_data.clear();

for (const auto& box : boxes) {

box_data.push_back(box.x1);

box_data.push_back(box.y1);

box_data.push_back(box.x2);

box_data.push_back(box.y2);

}

score_data.clear();

for (size_t cls = 0; cls < NUM_CLASSES; ++cls) {

for (size_t i = 0; i < NUM_GRID_CELLS; ++i) {

score_data.push_back(scores[i][cls]);

}

ov::Tensor boxes_tensor(ov::element::f32, { 1, NUM_GRID_CELLS, 4 }, box_data.data());

ov::Tensor scores_tensor(ov::element::f32, { 1, NUM_CLASSES, NUM_GRID_CELLS }, score_data.data());

I want to copy the cpu tensor data to gpu tensor created in the same context and need to carry out the inference process.

I want to pass these as input to a GPU tensor for Multiclassnms execution.

Currently am getting segmentation fault in passing or copying the data to GPU tensor.

Could anyone guide me on this ?

Thanks in Advance!

Wan_Intel · ‎03-28-2025

Hi Gayathri_Sankaran,

Thank you for reaching out to us.

Could you please share the models in Intermediate Representation and Scripts to reproduce the issue so that we can replicate the issue from our end?

On the other hand, could you please provide the following details?

Hardware specifications
Host Operating System
OpenVINO™ toolkit version used
Deep Learning Framework used

Regards,

Wan

Gayathri_Sankaran · ‎04-01-2025

Hi Wan,

The model for NMS is custom model, I have created these model with the nodes specifying it as box , score and mentioning nms attribute to feed the class with iou and score thresholds.

Furnishing the details you have asked for:

Hardware specifications:
- System SKU LENOVO_MT_21FW_BU_Think_FM_ThinkPad P1 Gen 6
- Processor 13th Gen Intel(R) Core(TM) i7-13800H, 2500 Mhz, 14 Core(s), 20 Logical Processor(s)
- GPU - Intel(R) UHD Graphics and NVIDIA RTX A1000 6GB Laptop GPU
Host Operating System
- OS Name Microsoft Windows 11 Enterprise
OpenVINO™ toolkit version used
- 2023.3.0-13775-ceeafaf64f3-releases/2023/3
Deep Learning Framework used - There are two models compiled and used in my case. Object detection model using yolox-nano was trained and used for inferencing. The output of the first inference call is restructured in such a way, extracting the box tensor and score tensor by doing some transpose operation. These are given as input for the custom created NMS model, as multiclassnms class needs input in a specified way. So I need to convert this cpu tensor values to gpu tensor created in the same context where nms model is compiled.

Wan_Intel · ‎04-01-2025

Hi Gayathri_Sankaran,

Thanks for the information.

You may share your custom models and scripts to replicate the issue here or you may upload to your Google Drive so that we can download the models and scripts for replication purposes.

Regards,

Wan

Wan_Intel · ‎04-12-2025

Hi Gayathri_Sankaran,

Thank you for the question.

If you need additional information from Intel, please submit a new question as this thread will no longer be monitored.

Regards,

Wan