Solved: CropAndResize does not work for output dimensions > (30, 30)

Jana__Sandeep · ‎05-04-2020

Hi,

I am using tf.image.crop_and_resize in my model code. Not as a pre-processing step but as an intermediate operation.

In the IR code, this is operation is appearing as ROIPooling.

The cropped and resized output is fine when output dimension is less that or equal to (30, 30). When the output is greater than (30,30), I get the following run-time error when run on Myriad/neural stick. On CPU, the same working fine

E: [xLink] [    497131] [EventRead00Thr] eventReader:218        eventReader thread stopped (err -4)
E: [xLink] [    497131] [Scheduler00Thr] eventSchedulerRun:576  Dispatcher received NULL event!
E: [global] [    497131] [one_model] XLinkReadDataWithTimeOut:1494      Event data is invalid
E: [ncAPI] [    497131] [one_model] ncFifoReadElem:3510 Packet reading is failed.
[Thread 0x7fffef2ee700 (LWP 22032) exited]
[Thread 0x7fffefaef700 (LWP 22030) exited]
E: [ncAPI] [    497132] [one_model] ncFifoDestroy:3333  Failed to write to fifo before deleting it!

I am using 2019_R3.1 (cannot use any newer version). Attached some files. Thanks!

Part of XML that works is below. If I change 16 to, say 32, I get run time error.

        <layer id="10" name="crop_resize_1/CropAndResize" precision="FP16" type="ROIPooling">
            <data method="bilinear" pooled_h="16" pooled_w="16" spatial_scale="1"/>
            <input>
                <port id="0">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>270</dim>
                    <dim>480</dim>
                </port>
                <port id="1">
                    <dim>1</dim>
                    <dim>5</dim>
                </port>
            </input>
            <output>
                <port id="2">
                    <dim>1</dim>
                    <dim>3</dim>
                    <dim>16</dim>
                    <dim>16</dim>
                </port>
            </output>
        </layer>
    </layers>

Luis_at_Intel · ‎08-24-2020

Hi @Jana__Sandeep ,

Thank you for your patience and I apologize for the delay, but I have good news to share. This issue has been fixed and will be addressed in the v2021.1 release, please keep an eye for when the release becomes available. I am sure there will be announcements made here in the community and other places. Thank you for using OpenVINO!

Regards,

Luis

View solution in original post

Luis_at_Intel · ‎05-05-2020

Hi Sandeep,

Thanks for reaching out. I am able to convert the provided .pb file given in the attachment you shared, but I am unable to run the converted IR files with an OpenVINO sample, can you please provide more details about your model? Like if it is an object detection model, based on which topology, what type of object is detecting, etc.. If you can provide a code snippet you used to run this model so we can replicate the issue it would be great.

Another question I have, are the CropAndResize layer modifications done to the original TF model prior to MO conversion? Or are you manually modifying the ROIPooling layers in XML file? If you have access to the original TF model code prior to exporting to .pb, please try making these changes to the CropAndResize layer and check if the problem is present.

Best Regards,

Luis

Jana__Sandeep · ‎05-07-2020

Hi Luis,

Thanks for reply. The CropAndResize layer input arguments were modified in the TF model prior to MO conversion.

I made a minimal sample app taking main.cpp from object detection demo. So far unable to upload here. On clicking the "Upload" button, I am getting "An AJAX HTTP request terminated abruptly" browser pop-up. along with some lengthy debug information. I switched browsers, PCs but no avail.

So copy pasting the code here. Please let us know if more info is needed.

Commands:

Model:
python crop_resize.py

IR converison:
python /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py --input_model crop_resize.pb 
--input input_1,input_2 --batch 1 --data_type FP16 --model_name crop_resize --output lambda_1/CropAndResize


sample app: 
./crop_resize -i images_288/ -d MYRIAD -m crop_resize.xml
images_288 folder contains a BMP image of resolution 288x288

Model definition:

import tensorflow as tf
import numpy as np 
import matplotlib.pyplot as plt


from keras import layers
from keras.models import Model, load_model
from keras import backend as K

def freeze_session(session, keep_var_names=None, output_names=None, clear_devices=True):
    graph = session.graph
    with graph.as_default():
        freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(keep_var_names or []))
        output_names = output_names or []
        output_names += [v.op.name for v in tf.global_variables()]
        input_graph_def = graph.as_graph_def()
        if clear_devices:
            for node in input_graph_def.node:
                node.device = ""
        frozen_graph = tf.graph_util.convert_variables_to_constants(
            session, input_graph_def, output_names, freeze_var_names)
        return frozen_graph
        

images = layers.Input((288, 288, 3))
boxes = layers.Input((4,))

CropResize = layers.Lambda(lambda x: tf.image.crop_and_resize(
                x[0], x[1], K.constant([0], dtype=tf.int32),
                K.constant([64, 64], tf.int32), # CROP SIZE!
                method='bilinear', extrapolation_value=0))

inputs = [images, boxes]
outputs = CropResize(inputs)

model = Model(inputs, outputs, name='crop_resize')

# [y1, x1, y2, x2] in range (0 to 1)
boxes = np.array([[0.25, 0.25], [0.75, 0.75]], np.float32)
boxes = np.reshape(boxes, (1, 4))

imshape = (288, 288, 3)
im = np.zeros(imshape)
for y in range(9):
    for x in range(9):
        for c in range(3):
            im[32*y:32*(y+1), 32*x:32*(x+1), c] = np.random.uniform(0, 1)

plt.imshow(im)
plt.savefig('input_image.png')

im = np.reshape(im, (1,) + imshape)
crop = model.predict([im, boxes], steps=1)

plt.imshow(crop[0])
plt.savefig('crop.png')

frozen_graph = freeze_session(K.get_session(),
                            output_names=[out.op.name for out in model.outputs])
tf.train.write_graph(frozen_graph, '.', "{}.pb".format(model.name), as_text=False)
print('Done!')

Sample app:

// Copyright (C) 2018-2019 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include <chrono> 
#include <gflags/gflags.h>
#include <iostream>
#include <string>
#include <memory>
#include <vector>
#include <algorithm>
#include <map>

#include <format_reader_ptr.h>
#include <inference_engine.hpp>
#include <ext_list.hpp>

#include <samples/common.hpp>
#include <samples/slog.hpp>
#include <samples/args_helper.hpp>

#include <vpu/vpu_tools_common.hpp>
#include <vpu/vpu_plugin_config.hpp>

#include "crop_resize.h"

using namespace InferenceEngine;

ConsoleErrorListener error_listener;

bool ParseAndCheckCommandLine(int argc, char *argv[]) {
    // ---------------------------Parsing and validation of input args--------------------------------------
    gflags::ParseCommandLineNonHelpFlags(&argc, &argv, true);
    if (FLAGS_h) {
        showUsage();
        showAvailableDevices();
        return false;
    }

    slog::info << "Parsing input parameters" << slog::endl;

    if (FLAGS_i.empty()) {
        throw std::logic_error("Parameter -i is not set");
    }

    if (FLAGS_m.empty()) {
        throw std::logic_error("Parameter -m is not set");
    }

    return true;
}

static std::map<std::string, std::string> configure(const std::string& confFileName) {
    auto config = parseConfig(confFileName);

    return config;
}


template<typename T>
void print_name_dimensions(const T& item)
{
    slog::info << item.first << "\n";
    const SizeVector tensorDims = item.second->getTensorDesc().getDims();
    for(size_t i = 0; i < tensorDims.size(); ++i)
    {
        slog::info << "     tensorDims[" << i << "] = "  << tensorDims << "\n";
    }
}

/**
* \brief The entry point for the Inference Engine object_detection sample application
* \file object_detection_sample_ssd/main.cpp
* \example object_detection_sample_ssd/main.cpp
*/
int main(int argc, char *argv[]) {
    try {
        /** This sample covers certain topology and cannot be generalized for any object detection one **/
        slog::info << "InferenceEngine: " << GetInferenceEngineVersion() << "\n";

        // --------------------------- 1. Parsing and validation of input args ---------------------------------
        if (!ParseAndCheckCommandLine(argc, argv)) {
            return 0;
        }
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 2. Read input -----------------------------------------------------------
        /** This vector stores paths to the processed images **/
        std::vector<std::string> images;
        parseInputFilesArguments(images);
        if (images.empty()) throw std::logic_error("No suitable images were found");
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 3. Load inference engine -------------------------------------
        slog::info << "Loading Inference Engine" << slog::endl;
        Core ie;

        slog::info << "Device info: " << slog::endl;
        std::cout << ie.GetVersions(FLAGS_d);

        if (FLAGS_p_msg) {
            ie.SetLogCallback(error_listener);
        }

        /*If CPU device, load default library with extensions that comes with the product*/
        if (FLAGS_d.find("CPU") != std::string::npos) {
            /**
            * cpu_extensions library is compiled from "extension" folder containing
            * custom MKLDNNPlugin layer implementations. These layers are not supported
            * by mkldnn, but they can be useful for inferring custom topologies.
            **/
            ie.AddExtension(std::make_shared<Extensions::Cpu::CpuExtensions>(), "CPU");
        }

        if (!FLAGS_l.empty()) {
            // CPU(MKLDNN) extensions are loaded as a shared library and passed as a pointer to base extension
            IExtensionPtr extension_ptr = make_so_pointer<IExtension>(FLAGS_l);
            ie.AddExtension(extension_ptr, "CPU");
            slog::info << "CPU Extension loaded: " << FLAGS_l << slog::endl;
        }

        if (!FLAGS_c.empty()) {
            // clDNN Extensions are loaded from an .xml description and OpenCL kernel files
            ie.SetConfig({ { PluginConfigParams::KEY_CONFIG_FILE, FLAGS_c } }, "GPU");
            slog::info << "GPU Extension loaded: " << FLAGS_c << slog::endl;
        }
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 4. Read IR Generated by ModelOptimizer (.xml and .bin files) ------------
        std::string binFileName = fileNameNoExt(FLAGS_m) + ".bin";
        slog::info << "Loading network files:"
            "\n\t" << FLAGS_m <<
            "\n\t" << binFileName <<
            slog::endl;

        CNNNetReader networkReader;
        /** Read network model **/
        networkReader.ReadNetwork(FLAGS_m);

        /** Extract model name and load weights **/
        networkReader.ReadWeights(binFileName);
        CNNNetwork network = networkReader.getNetwork();
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 5. Prepare input blobs --------------------------------------------------
        slog::info << "Preparing input blobs" << slog::endl;

        /** Taking information about all topology inputs **/
        InputsDataMap inputsInfo(network.getInputsInfo());

        /** SSD network has one input and one output **/
        if (inputsInfo.size() != 1 && inputsInfo.size() != 2) throw std::logic_error("Sample supports topologies only with 1 or 2 inputs");

        /**
         * Some networks have SSD-like output format (ending with DetectionOutput layer), but
         * having 2 inputs as Faster-RCNN: one for image and one for "image info".
         *
         * Although object_datection_sample_ssd's main task is to support clean SSD, it could score
         * the networks with two inputs as well. For such networks imInfoInputName will contain the "second" input name.
         */
        std::string imageInputName, imInfoInputName;

        InputInfo::Ptr inputInfo = nullptr;

        SizeVector inputImageDims;
        /** Stores input image **/

        /** Iterating over all input blobs **/
        for (auto & item : inputsInfo) {
            /** Working with first input tensor that stores image **/
            if (item.second->getInputData()->getTensorDesc().getDims().size() == 4) {
                imageInputName = item.first;

                inputInfo = item.second;

                slog::info << "Batch size is " << std::to_string(networkReader.getNetwork().getBatchSize()) << slog::endl;

                /** Creating first input blob **/
                Precision inputPrecision = Precision::FP32;
                item.second->setPrecision(inputPrecision);
                print_name_dimensions(item);
            } else if (item.second->getInputData()->getTensorDesc().getDims().size() == 2) {

                imInfoInputName = item.first;

                Precision inputPrecision = Precision::FP32;
                item.second->setPrecision(inputPrecision);
                print_name_dimensions(item);
            }
        }

        if (inputInfo == nullptr) {
            inputInfo = inputsInfo.begin()->second;
        }
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 6. Prepare output blobs -------------------------------------------------
        slog::info << "Preparing output blobs" << slog::endl;

        const size_t max_num_output_nodes = 1;
        OutputsDataMap outputsInfo(network.getOutputsInfo());

        std::string outputNames[max_num_output_nodes];
        DataPtr outputInfos[max_num_output_nodes];
        SizeVector outputDimsAll[max_num_output_nodes];

        size_t num_output_nodes = 0;

        //set outputs in order
        for (const auto& out : outputsInfo) {
            
            auto outputName = out.first;
            auto outputInfo = out.second;
            /** Set the precision of output data provided by the user,
             *  should be called before load of the network to the device **/
            outputInfo->setPrecision(Precision::FP32);

            print_name_dimensions(out);

            const SizeVector outputDims = outputInfo->getTensorDesc().getDims();
            outputNames[num_output_nodes] = outputName;
            outputInfos[num_output_nodes] = outputInfo;
            outputDimsAll[num_output_nodes] = outputDims;
            
            ++num_output_nodes;
        }

        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 7. Loading model to the device ------------------------------------------
        slog::info << "Loading model to the device" << slog::endl;

        ExecutableNetwork executable_network = ie.LoadNetwork(network, FLAGS_d, configure(FLAGS_config));
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 8. Create infer request -------------------------------------------------
        slog::info << "Create infer request" << slog::endl;
        InferRequest infer_request = executable_network.CreateInferRequest();
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 9. Prepare input --------------------------------------------------------
        /** Collect images data ptrs **/
        std::vector<std::shared_ptr<unsigned char>> imagesData, originalImagesData;
        std::vector<size_t> imageWidths, imageHeights;
        for (auto & i : images) {
            FormatReader::ReaderPtr reader(i.c_str());
            if (reader.get() == nullptr) {
                slog::warn << "Image " + i + " cannot be read!" << slog::endl;
                continue;
            }
            /** Store image data **/
            std::shared_ptr<unsigned char> originalData(reader->getData());
            std::shared_ptr<unsigned char> data(reader->getData(inputInfo->getTensorDesc().getDims()[3], inputInfo->getTensorDesc().getDims()[2]));
            if (data.get() != nullptr) {
                originalImagesData.push_back(originalData);
                imagesData.push_back(data);
                imageWidths.push_back(reader->width());
                imageHeights.push_back(reader->height());
            }
        }
        if (imagesData.empty()) throw std::logic_error("Valid input images were not found!");

        size_t batchSize = network.getBatchSize();
        slog::info << "Batch size is " << std::to_string(batchSize) << slog::endl;
        if (batchSize != imagesData.size()) {
            slog::warn << "Number of images " + std::to_string(imagesData.size()) + \
                " doesn't match batch size " + std::to_string(batchSize) << slog::endl;
            batchSize = std::min(batchSize, imagesData.size());
            slog::warn << "Number of images to be processed is "<< std::to_string(batchSize) << slog::endl;
        }

        /** Creating input blob **/
        Blob::Ptr imageInput = infer_request.GetBlob(imageInputName);

        /** Filling input tensor with images. First b channel, then g and r channels **/
        size_t num_channels = imageInput->getTensorDesc().getDims()[1];
        size_t image_size = imageInput->getTensorDesc().getDims()[3] * imageInput->getTensorDesc().getDims()[2];

        auto data = static_cast<float*>(imageInput->buffer());

        /** Iterate over all input images **/
        for (size_t image_id = 0; image_id < std::min(imagesData.size(), batchSize); ++image_id) {
            /** Iterate over all pixel in image (b,g,r) **/
            for (size_t pid = 0; pid < image_size; pid++) {
                /** Iterate over all channels **/
                for (size_t ch = 0; ch < num_channels; ++ch) {
                    /**          [images stride + channels stride + pixel id ] all in bytes            **/
                    data[image_id * image_size * num_channels + ch * image_size + pid] =
                        imagesData.at(image_id).get()[pid*num_channels + ch];
                }
            }
        }

        if (imInfoInputName != "") {
            Blob::Ptr input2 = infer_request.GetBlob(imInfoInputName);

            /** Fill input tensor with values **/
            float *p = input2->buffer().as<PrecisionTrait<Precision::FP32>::value_type*>();

            /* cropping coordinates (y1, x1, y2, x2) */
            float box[][2] = {{0.25, 0.25}, {0.75, 0.75}};
            p[0] = box[0][0];
            p[1] = box[0][1];
            p[2] = box[1][0];
            p[3] = box[1][1];

        }
        // --------------------------- 10. Do inference ---------------------------------------------------------
        slog::info << "Start inference" << slog::endl;
        infer_request.Infer();
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 11. Process output -------------------------------------------------------
        slog::info << "Processing output blobs" << slog::endl;

        const float* predictions[max_num_output_nodes] = {};
        for(size_t i = 0; i < num_output_nodes; ++i) {
            const Blob::Ptr output_blob = infer_request.GetBlob(outputNames);
            predictions =  static_cast<PrecisionTrait<Precision::FP32>::value_type*>(output_blob->buffer());
        }
        
        for (size_t node = 0; node < num_output_nodes; ++node) {            
            slog::info << outputNames[node] << ": ";
            auto outputDims = outputDimsAll[node];            
            std::string outfile_name = std::string("/home/user/volatile/output_blob_") + 
                std::to_string(node) + ".bin";
            auto size = sizeof(float);
            for(auto s: outputDims)
                size *= s;                            
            std::ofstream outfile(outfile_name, std::ios::binary | std::ios::out);
            outfile.write((const char*) predictions[node], size);
            slog::info << "Dumped " << size << " bytes to " << outfile_name << " \n";
        }

        // -----------------------------------------------------------------------------------------------------
    }
    catch (const std::exception& error) {
        slog::err << error.what() << slog::endl;
        return 1;
    }
    catch (...) {
        slog::err << "Unknown/internal exception happened." << slog::endl;
        return 1;
    }

    slog::info << "Execution successful" << slog::endl;
    slog::info << slog::endl << "This sample is an API example, for any performance measurements "
                                "please use the dedicated benchmark_app tool" << slog::endl;
    return 0;
}

Luis_at_Intel · ‎05-08-2020

Hi Jana, Sandeep,

Thanks for the information, I am able to replicate the problem you are reporting in OpenVINO 2020.2 and Ubuntu 18.04. It's unclear what the issue might be with Myriad pluygin, I have reported this to the developer team and will provide a response once I have updates.

Can you please also share the details of your environment (versions of OS, Python, CMake, TF, platform/CPU)? Just so we have this information at hand in case its needed by dev team as well. Thanks!

Best Regards,

Luis

Jana__Sandeep · ‎05-09-2020

Hi Luis,

I am using OpenVINO 2019_R3.1, Python 3.6.8, TF 1.12, cmake 3.13.4, Ubuntu 18.04.3 on Intel(R) Core(TM) i7-4790 CPU.

Thanks for your efforts in reproducing the issue.

-Sandeep

Jana__Sandeep · ‎05-21-2020

Hi Luis, may I know if the developer team provided any updates on this issue? Thanks!

Luis_at_Intel · ‎06-05-2020

Hi Jana, Sandeep,

I apologize for the delay in my response, the team is still investigating this issue. I will share any updates as they become available, sorry for the inconvenience this may cause!

Regards,

Luis

Luis_at_Intel · ‎08-24-2020

Hi @Jana__Sandeep ,

Thank you for your patience and I apologize for the delay, but I have good news to share. This issue has been fixed and will be addressed in the v2021.1 release, please keep an eye for when the release becomes available. I am sure there will be announcements made here in the community and other places. Thank you for using OpenVINO!

Regards,

Luis