Using Inference engine with a mixed-precision model

svutran · ‎06-08-2022

Hi,

We have quantized our trained model in a way that it has both INT8 and FP16 weights.

For inference time, we are using OpenVINO's Inference engine to load the model from memory (not from a file path) using the following method:

CNNNetwork ReadNetwork(const std::string& model, const Blob::CPtr& weights) const;

Now, this method requires to create a Tensor (InferenceEngine::TensorDesc) so that we can pass our weights in it and then pass it to the InferenceEngine::make_shared_blob method. The problem here is that TensorDesc does not support Inference::Precision::MIXED.

Here is the error message that is thrown when executing the program:

Cannot make shared blob! The blob type cannot be used to store objects of current precision

So, how should we proceed in order to read a network where the weights are mixed-precision?

Here is a code snippet of how we are loading the model. Not that poBin refers to the binary content of the weights in memory.

        InferenceEngine::TensorDesc oTensor(InferenceEngine::Precision::MIXED, oWeightsContentSize, InferenceEngine::Layout::ANY);
        auto oWeightBlob = InferenceEngine::make_shared_blob(oTensor, poBin[i], uiBinSize[i]);

        InferenceEngine::CNNNetwork* poNetwork = new InferenceEngine::CNNNetwork();
        *poNetwork = oCore.ReadNetwork(std::string(reinterpret_cast<const char *>(poXml[i]),
                reinterpret_cast<const char *>(poXml[i] + uiXmlSize[i])), oWeightBlob);
 
Thank you in advance

Zulkifli_Intel · ‎06-09-2022

Hello Svutran.

Thank you for reaching out to us.

Please share your mixed precisions model, your script, and any relevant information with us for further investigation.

Also, which OpenVINO version did you use to run with the model.

Sincerely,

Zulkifli

svutran · ‎06-15-2022

Sorry for the late reply, but putting my code here would be a little complicated.

Actually my question is: do we need to set the input and output precision with the following call if we are working in FP16 or mixed-precision which includes FP16:

InferenceEngine::InputsDataMap inputs_info = network.getInputsInfo();

inputs_info.second->setPrecision(Precision::FP16);

Thanks

Zulkifli_Intel · ‎06-16-2022

Hello Svutran,

If you are working with FP16 or mixed-precision, particularly for int8-fp16 mixed-precision network, it can be set as below:

inputs_info.second->setPrecision(Precision::FP16);

Sincerely,

Zulkifli

Zulkifli_Intel · ‎06-26-2022

Hello Svutran,

Thank you for your question. If you need any additional information from Intel, please submit a new question as this thread is no longer being monitored.

Sincerely,

Zulkifli