Understanding quantization of IR of model

Dutta_Roy__Souptik · ‎10-22-2019

Hi,

If I "downscale" my FP32 model with the

--data_type = FP16

option, according to the docs the model weights and biases for "intermediate" layers are quantized to FP16. I presume "intermediate" here refers to all the layers of the model other than the input and output layers which I presume will stay at FP32.

So, is my input and subsequent output after each layer automatically downscaled to FP16 while passing through the network? And upscaled back to FP32 at the output layer?

How does specifying data type input and output layers using

auto network = network_reader.getNetwork();
/** Taking information about all topology inputs **/
InferenceEngine::InputsDataMap input_info(network.getInputsInfo());
/** Taking information about all topology outputs **/
InferenceEngine::OutputsDataMap output_info(network.getOutputsInfo());

/** Iterating over all input info**/
for (auto &item : input_info) {
    auto input_data = item.second;
    input_data->setPrecision(Precision::FP16);
}

play with this pipeline? In that case, is there no automatic quantization of the input layer?

Thanks.