Dearest naing, nyan,

nnain1 · ‎04-19-2019

Inside object_detection_sample_ssd, there is loading images from file names.

FormatReader::ReaderPtr reader(i.c_str());
where i is file name.
std::shared_ptr<unsigned char> data(reader->getData(inputInfo->getTensorDesc().getDims()[3], inputInfo->getTensorDesc().getDims()[2]));
So it is loading from file name.

I need to load from image Mat after reading image from camera or video. So I did as follows.

cv::Mat crop = curr_frame.clone();
//Resize
cv::Mat crop_(inputsInfo[recimageInputName]->getTensorDesc().getDims()[2],inputsInfo[recimageInputName]->getTensorDesc().getDims()[3],crop.type());
cv::resize(crop, crop_, crop_.size(), 0, 0, cv::INTER_LINEAR);
Blob::Ptr imageInput = rec_infer_request->GetBlob(recimageInputName);
unsigned char* data = static_cast<unsigned char*>(imageInput->buffer());
int cnt=0;
int colcnt=crop_.cols*3;//rgb
for (int y = 0; y < crop_.rows; ++y)
{
          unsigned char* row_ptr = crop_.ptr<unsigned char>(y);
            for (int x = 0; x < colcnt; ++x)
            {
               data[cnt++] = row_ptr;
           }
}
if (recimInfoInputName != "") {
                Blob::Ptr input2 = rec_infer_request->GetBlob(recimInfoInputName);
                auto imInfoDim = inputsInfo.find(recimInfoInputName)->second->getTensorDesc().getDims()[1];

                float *p = input2->buffer().as<PrecisionTrait<Precision::FP32>::value_type*>();
                 p[0] = static_cast<float>(inputsInfo[recimageInputName]->getTensorDesc().getDims()[2]);
                p[1] = static_cast<float>(inputsInfo[recimageInputName]->getTensorDesc().getDims()[3]);
                for (size_t k = 2; k < imInfoDim; k++) {
                   p = 1.0f; // all scale factors are set to 1.0
                }
}
rec_infer_request->Infer();
const Blob::Ptr output_blob = rec_infer_request->GetBlob(recoutputName);
const float* recdetection = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output_blob->buffer());

Is the way how I load image for inference correct?

Do I still need to swap BGR to RGB? The problem is inference output has no detections. Confidence is very low.

This is how input data is prepared.

InputsDataMap inputsInfo(recNetwork.getInputsInfo());
        if (inputsInfo.size() != 1 && inputsInfo.size() != 2) throw std::logic_error("Sample supports topologies only with 1 or 2 inputs");
        std::string recimageInputName, recimInfoInputName;
        InputInfo::Ptr recinputInfo = nullptr;
        SizeVector inputImageDims;
        for (auto & item : inputsInfo) {
            if (item.second->getInputData()->getTensorDesc().getDims().size() == 4) {
                recimageInputName = item.first;
                recinputInfo = item.second;
                slog::info << "Batch size is " << std::to_string(recNetworkReader.getNetwork().getBatchSize()) << slog::endl;
                Precision inputPrecision = Precision::U8;
                item.second->setPrecision(inputPrecision);
                if (FLAGS_auto_resize) {
            recinputInfo->getPreProcess().setResizeAlgorithm(ResizeAlgorithm::RESIZE_BILINEAR);
            recinputInfo->getInputData()->setLayout(Layout::NHWC);
       } else {
            recinputInfo->getInputData()->setLayout(Layout::NCHW);
       }
            } else if (item.second->getInputData()->getTensorDesc().getDims().size() == 2) {
                recimInfoInputName = item.first;
                Precision inputPrecision = Precision::FP32;
                item.second->setPrecision(inputPrecision);
                if ((item.second->getTensorDesc().getDims()[1] != 3 && item.second->getTensorDesc().getDims()[1] != 6)) {
                    throw std::logic_error("Invalid input info. Should be 3 or 6 values length");
                }
            }
        }
        if (recinputInfo == nullptr) {
            recinputInfo = inputsInfo.begin()->second;

        }

Alternate Method:

I also tried

frameToBlob(crop, rec_infer_request, recimageInputName);
rec_infer_request->Infer();

But I have segmentation fault at rec_infer_request->Infer();

nnain1 · ‎04-20-2019

I had error in my earlier code at

for (size_t pid = 0; pid < image_size; ++pid)
{
	for (size_t ch = 0; ch < num_channels; ++ch)
	 {
		data[ch * image_size + pid] = crop.ptr<unsigned char>()[pid*num_channels + ch];
	 }
}

Need to arrange three channel data one after another. Earlier one was not.

Then my code has Input preparation as

InputsDataMap inputsInfo(recNetwork.getInputsInfo());
 if (inputsInfo.size() != 1 && inputsInfo.size() != 2) throw std::logic_error("Sample supports topologies only with 1 or 2 inputs");
 std::string recimageInputName, recimInfoInputName;
 InputInfo::Ptr recinputInfo = nullptr;
 SizeVector inputImageDims;
 for (auto & item : inputsInfo) {
      if (item.second->getInputData()->getTensorDesc().getDims().size() == 4) {
           recimageInputName = item.first;
           recinputInfo = item.second;
           slog::info << "Batch size is "<<std::to_string(recNetworkReader.getNetwork().getBatchSize()) << slog::endl;
           Precision inputPrecision = Precision::U8;
           item.second->setPrecision(inputPrecision);
           if (FLAGS_auto_resize) {
		       recinputInfo->getPreProcess().setResizeAlgorithm(ResizeAlgorithm::RESIZE_BILINEAR);
		       recinputInfo->getInputData()->setLayout(Layout::NHWC);
           } else {
		       recinputInfo->getInputData()->setLayout(Layout::NCHW);
           }
     } else if (item.second->getInputData()->getTensorDesc().getDims().size() == 2) {
           recimInfoInputName = item.first;
           Precision inputPrecision = Precision::FP32;
           item.second->setPrecision(inputPrecision);
           if ((item.second->getTensorDesc().getDims()[1] != 3 && item.second->getTensorDesc().getDims()[1] != 6)) {
                    throw std::logic_error("Invalid input info. Should be 3 or 6 values length");
           }
     }
 }
 if (recinputInfo == nullptr) {
        recinputInfo = inputsInfo.begin()->second;
           
 }

Then get the input data and do inference as follows

cv::Mat crop1 = curr_frame.clone();
 cv::Mat crop(inputsInfo[recimageInputName]->getTensorDesc().getDims()[2],inputsInfo[recimageInputName]->getTensorDesc().getDims()[3],crop.type());
cv::resize(crop1, crop, crop.size(), 0, 0, cv::INTER_LINEAR);
// --------------------------- Prepare input for Recognition-------------------------
 Blob::Ptr imageInput = rec_infer_request->GetBlob(recimageInputName);
unsigned char* data = static_cast<unsigned char*>(imageInput->buffer());
size_t image_size = imageInput->getTensorDesc().getDims()[3] * imageInput->getTensorDesc().getDims()[2];
 for (size_t pid = 0; pid < image_size; ++pid)
{
    for (size_t ch = 0; ch < num_channels; ++ch)
   {
	data[ch * image_size + pid] = crop.ptr<unsigned char>()[pid*num_channels + ch];
   }
}
if (recimInfoInputName != "") {
	Blob::Ptr input2 = rec_infer_request->GetBlob(recimInfoInputName);
	auto imInfoDim = inputsInfo.find(recimInfoInputName)->second->getTensorDesc().getDims()[1];
       float *p = input2->buffer().as<PrecisionTrait<Precision::FP32>::value_type*>();
 	p[0] = static_cast<float>(inputsInfo[recimageInputName]->getTensorDesc().getDims()[2]);
	p[1] = static_cast<float>(inputsInfo[recimageInputName]->getTensorDesc().getDims()[3]);
	 for (size_t k = 2; k < imInfoDim; k++) {
		 p = 1.0f;  // all scale factors are set to 1.0
	}
 }
rec_infer_request->Infer();
 const Blob::Ptr output_blob = rec_infer_request->GetBlob(recoutputName);
const float* recdetection = static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output_blob->buffer());

But I have very low confidence in detection. My training in Tensorflow is fine and tested with high accuracy and recall.

nnain1 · ‎04-20-2019

What I did was correct. Just that I thought, the highest confidences are on top of the array. Actually they are not in order from highest to lowest.

By the way, do I need to convert BGR to RGB in inference?

Do I need to normalize the image to be between -1 and 1?

Because the accuracy and recall are much lower than tensorflow model.

Shubha_R_Intel · ‎04-23-2019

Dearest naing, nyan,

OpenVino samples usually use OpenCV to read in images and yes, OpenCV returns BGR. So if you trained your model on RGB layout then you definitely need to specify --reverse_input_channels to model optimizer.

Also please study the OpenVino sample hello_autoresize_classification. It does exactly what you want :

// --------------------------- 6. Prepare input --------------------------------------------------------
        /* Read input image to a blob and set it to an infer request without resize and layout conversions. */
        cv::Mat image = cv::imread(input_image_path);
        Blob::Ptr imgBlob = wrapMat2Blob(image);  // just wrap Mat data by Blob::Ptr without allocating of new memory
        infer_request.SetBlob(input_name, imgBlob);  //

Thanks,

Shubha

How to load image Mat (opencv obj) directly to data pointer for inference?