Quantized weight extraction

PovilasM · ‎11-06-2020

Product version: OpenVino 2020.4.287
OS info and version: Ubuntu 18.04
Compiler version: g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

To start wit some context, what I have done is:
1) converted a caffe model using Intel DL Workbench (this was the regular, non-accuracy-aware quantization). This worked as expected, the model inferred just fine.
2) Loaded the quantized model up in python and attempted printing the weights / biases / etc. using the following code:

from openvino.inference_engine import IECore
model_name = 'model name without extension here'
ie = IECore()
load_net = ie.read_network(model_name + '.xml', model_name + ".bin")

for key, value in load_net.layers.items():
    print(key)
    print(value.blobs)
    print()

The output, however, has lots of keys that are hard to relate to a known layer or functionality, for example:

Constant_237470
{'custom': array([-18110, -15684,  16539, -15334,  15832,  15617, -15191,  15828,
        12980, -13641, -16236, -14621,  13806, -19496,  16231, -15300,
       -14803, -15777,  15873,  15505, -15248, -18589, -16263,  14939,
        12312, -14812, -19606,  16201, -18276,  13351, -18696,  15761,
       -17694, -14884,  16102,  15524, -16905, -15015,  12849,  16884,
        15669,  16261, -16374,  15811,  15389,  13580,  16899,  14068,
        14405,  14665,  15456, -15154, -14009,  13218,  15949, -16242,
       -13277,  13415, -18691, -13111, -12710, -15460, -20598,  16105,
       -21186, -15131, -18662, -15039,  15615, -15672,  15753,  17508,
        14628, -14383, -18767,  15893, -15673, -16682, -20276,  12840,
        16044,  16974, -18769, -15297, -14036, -14299, -22485, -15453,
       -15334, -14786,  13692, -18218,  13732,  14286, -18899, -21173,
       -14185,  16173,  15287,  15456, -15114, -13836, -18019, -14831,
        16385, -13149, -17880,  13820,  13824, -14247,  13515,  16090,
       -21242, -18130, -18796,  15265, -14564,  16137, -15166,  16430,
        12891, -13786,  16413, -15643, -15293,  16150, -17264, -15903,
        16207, -14850, -15834, -15338,  14601, -15317, -18025,  16474,
       -14222, -15788, -19489,  16046, -17227,  11217,  15845,  16318,
        16712, -13040,  16162, -14585, -15620, -15087, -23471, -15010,
        16223, -15479, -18386,  16450, -15361, -14984,  16005, -16152,
        17087, -13905,  17216,  14975, -16100,  15091, -14885, -21677,
       -14751, -14436, -15343,  15403, -15267, -15320,  13851,  15949,
        17884, -18002,  16363, -16314,  15698, -15459,  12783,  16237,
        15938,  14795,  14641,  14830, -13297, -14961,   8245,  13787,
       -15079, -19880, -17914, -17905, -17762, -18914,  16709, -17764,
       -13707,  16587, -18516, -14931, -17360, -16971, -17827,  12951,
        13758, -16320, -14234, -15478,  16448,  16200, -19603,  16790,
       -14768, -15333,  16487, -18898,  16400, -15978, -15235, -16328,
        13130,  15315,  13219, -16156,  14374, -13829,  17488, -14884,
        15204,  15599, -19625, -16852,  16740,  16498, -14203,  16572,
       -14147, -18612, -14147, -15005, -18706, -14559, -14763, -17352,
        16797, -14527, -17296, -15401, -15113,  16953, -16816, -19615],
      dtype=int16)}

The values are also int16 or fp32, which is strange, since I was expecting int8. I have confirmed that the quantized model I'm reading is almost exactly 4x smaller than the original fp32 model, just to be sure it has 8 bits per value.

Is there anyway to retrieve the int8 weights / biases / offsets / etc., or perhaps calculate them based on the outputs I'm getting? Also, how do I relate corresponding values when some of the names are kind of cryptic?

I'm asking, as I'm trying to extract the weights and insert them into another framework (or the other way around - insert weights of other framework into an OpenVino model) while preserving the quantization.

Thank you kindly!

Iffa_Intel · ‎11-09-2020

Greetings,

You can use the following sample or a part of its code as a reference to retrieve weights/biases values for IR model - https://github.com/ArtemSkrebkov/dldt/blob/askrebko/iterate-through-network/inference-engine/samples/cnn_network_parser/main.cpp

You could find more information on how it works in this Stack Overflow topic https://stackoverflow.com/questions/59930994/get-parameters-weights-for-each-layer-of-the-model-using-the-c-api-on-openvino

Sincerely,

Iffa

PovilasM · ‎11-10-2020

Thank you for answering, but, unfortunately, it does not answer any of my questions.

When using the C++ code referenced it says type is always FP16 - perhaps it's incorrect and should be ignored? Or perhaps they are, indeed, FP16 values derived from the int8 values?

It also still has those cryptic blobs ("25302534_const" and such). I verified that only quantized version has those, and not the original FP32 version.

Thank you for your time!

Iffa_Intel · ‎11-10-2020

INT8,FP16,FP32 each has their own range.

Squeezing the weight into smaller format would cause the loss of accuracy of the model. This is known as quantization error.

FP16 has wider range compared to INT8, hence if a model is quantized from INT8 to FP16, the accuracy is a lot better with a price of bigger memory usage.

Refer here for more: https://www.youtube.com/watch?v=RF8ypHyiKrY&list=PLg-UKERBljNxdIQir1wrirZJ50yTp4eHv&index=10

and this: https://software.intel.com/content/www/us/en/develop/articles/should-i-choose-fp16-or-fp32-for-my-deep-learning-model.html

Sincerely,

Iffa

Iffa_Intel · ‎11-18-2020

Greetings,

Intel will no longer monitor this thread since we have provided a solution. If you need any additional information from Intel, please submit a new question.

Sincerely,

Iffa