Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

Understanding quantization of IR of model



If I "downscale" my FP32 model with the

--data_type = FP16

option, according to the docs the model weights and biases for "intermediate" layers are quantized to FP16. I presume "intermediate" here refers to all the layers of the model other than the input and output layers which I presume will stay at FP32.

So, is my input and subsequent output after each layer automatically downscaled to FP16 while passing through the network? And upscaled back to FP32 at the output layer?

How does specifying data type input and output layers using

auto network = network_reader.getNetwork();
/** Taking information about all topology inputs **/
InferenceEngine::InputsDataMap input_info(network.getInputsInfo());
/** Taking information about all topology outputs **/
InferenceEngine::OutputsDataMap output_info(network.getOutputsInfo());

/** Iterating over all input info**/
for (auto &item : input_info) {
    auto input_data = item.second;

play with this pipeline? In that case, is there no automatic quantization of the input layer?


0 Kudos
0 Replies