My customer is now developing their product using OpenVINO R4. He is using his own dataset.
During the training, he trained by no scale value and no mean value on Caffe.
He converted the training model from FP32 to FP16 by using mo.py script.
However, he encountered a lot of inference errors when using FP16.
He used "cross-check tool" included in the OpenVINO package to compare the result between FP32 and FP16, and found overflow in some layers when using FP16 mode.
The overflow is not surprising because FP16 range is much smaller than FP32.
In this case, could you let me know what my customer has to do?
When MO (Model Optimizer) converts weights of a model from FP32 to FP16 it checks for maximum value overflow (in fact MO uses numpy function astype which performs the values conversion).
If the value overflow occurs then the following error is printed (however the IR is generated):
[ ERROR ] 83 elements of 189 were clipped to infinity while converting a blob for node [['conv2d_transpose']] to <class 'numpy.float16'>.
But the MO cannot guarantee that the overflow will not occur during inference. For example, you can create a network that will sum 2 values. Even though both of them are below float16 max value the sum of them will be more than the limit.
It is not possible to normalize weights values before converting because it will significantly decrease prediction results (or most probably completely break the topology) so there is no such feature in MO.
The recommendation to the customer would be to re-train the model with scaled input values to, for example, [0, 1] or [-1, 1] segment.