Intel® Distribution of OpenVINO™ Toolkit
Community assistance about the Intel® Distribution of OpenVINO™ toolkit, OpenCV, and all aspects of computer vision-related on Intel® platforms.

UINT8 / FP32 / FP16 precision switch between models

Nunez-Yanez__Jose
New Contributor I
1,613 Views

Hello,

 

I was wondering if anybody will have a suggestion about this. I have a model in which the first few layers are mapped to the CPU and the last layers are mapped to a NCS2 device.

The pipeline performs inference and the CPU part is using 8-bit precision and the NCS2 is  using FP16 precision.

How could I unquantize the 8-bit values to feed them to the floating point network at runtime ? 

or alternative how could I quantize the FP16 values from one of the models to input them into a 8-bit model at runtime ? 

I am not sure if some software implementation is available to perform this precision switches during inference for models that are not mapped to the same device.

Thanks for any ideas you might have, Maybe this is not possible after all.

0 Kudos
2 Replies
David_C_Intel
Employee
1,613 Views

Hi Jose,

Thanks for reaching out.

You can check this documentation about the heterogeneous plugin, it will let you run inference in the precision required for each layer in your model by setting a primary and a fallback device for backup.

For example, if you use FP16 IR files, you have to set the primary device as MYRIAD (for NCS2), then if you use the CPU as the fallback device., it would automatically convert the FP16 to FP32 when necessary for the CPU.

If you have additional questions, let us know

 

Best regards,

 

David

0 Kudos
Nunez-Yanez__Jose
New Contributor I
1,613 Views

Ok thanks

0 Kudos
Reply