Our Answers to Most Asked Questions on Optimization Techniques on OpenVINO

Zoe_C_Intel · ‎06-25-2020

Hey everyone!

Today, we hosted a live technical deep dive webinar on "Optimization Best Practices to Maximize Apps with OpenVINO™ toolkit". We had such a vibrant and engaged audience. Thanks to everyone who attended live and participated in the Q&A. We received so many great questions, so we're publishing a few of those questions we answered live and answering the questions we didn't get to here!

If you're interested in participating in future technical deep dive webinars, go here. If you missed last today's webinar, you can still watch it on-demand.

Please join and continue the conversation here!

How does OpenVINO optimize on CPU because deep learning usually include complex calculation which needs heavy GPU? In order to optimize, OpenVINO uses several graph optimization algorithms, such as fusing or removing layers. An example is a Convolution layer and simple layers like ReLU, ELU, etc, into a single fused layer. The CPU plugin also allows the removal of Power layers to optimize. OpenVINO leverages Intel MKL-DNN to optimize high-performance scoring, and Intel TBB to parallelize calculations.
What happens when you go to low precision? Models are normally trained using FP32 data type (i.e. full precision). In order to speed up performance and decrease latency to inference, quantization is a popular approach. You quantize floating-point numbers to 8-bit integers (for example), which reduces both memory computing requirements.
OpenVINO generates the model graphs or it has been used another tool for that? OpenVINO uses ngraph on the backend for graph implementation.
What is HDDL and GNA? HDDL is High-Density Deep Learning (these are add-in accelerators from Intel). GNA is Gaussian & Neural Accelerators, typically used for low-power speech processing. HDDL cards utilize Intel Movidius VPU and Intel FPGA.
When using the DeploymentManager and copying the package to the target device then OpenVINO doesn't need to be installed on the target device anymore (i.e. the package will contain MediaSDK, OpenCV, IPP, TensorFlow etc)? That's right. Only part of inference engine will be packed inside this deployment package.
...so the question I had was what is the tradeoff when you go to low precision? Great question! We have published benchmarks results on accuracy drop vs performance gains here: https://docs.openvinotoolkit.org/latest/_docs_performance_int8_vs_fp32.html Accuracy loss drops range from 0.01%-0.20% in typical cases
What does the deployment manager do? The deployment manager allows you to generate an optimal, minimized runtime package for the selected target device. It minimizes the file size depending on your target device.
My models have custom layers. Would these techniques work with them? Yes, if you have custom layers, you need to extend OpenVINO using extension mechanisms. First, you'll use the Model Optimizer to convert these into Intermediate Representation. Then, you'll need to program the custom layer for the Inference Engine depending on the target hardware. After then, all of these techniques will be supported.
Will OpenVINO maintain the accuracy of my model after the model is converted to .bin and .XML format? Yes, there will be no changes in accuracy, but there will be performance gains since the Model Optimizer uses algorithms, like layer fusing, to optimize the performance.
Can you explain "batch" in inferencing? Batch means the number of images that needs to be processed in one inference or iteration. For example, 10 images in one iteration.
What are the typical use cases for async vs sync mode? If your applications consist of several workloads (e.g. color pre-processing, scaling, etc), you could use async mode process the workloads in parallel. It's always recommended to use async mode.
Is there a maximum limit on how many streams we could create for CPU cores? Yes, the number shouldn't be bigger than your number of cores. The idea is to divide CPU resources into several paths. For example, for four cores, you can do four streams.
Is batch size encoded in the model? Or is it possible to adjust it after loading it to the Inference Engine? It is possible, we have a "reshape" function for you to change. You can do it during the Model Optimizer step using -b setup or at runtime in your code.
For the FPGA acceleration option does OpenVino generate HDL (Verilog/VHDL) that can be synthesized? Or it only runs in Altera using internal formats? No, OpenVINO has the unified API that abstracts the low-level programming for each of the hardware. The FPGA plugin interacts with FPGA board using bitstreams.
What formats of datasets are supported? ImageNet, Pascal VOC, COCO, Common Semantic Segmentation, and unannotated dataset. Learn more here: https://docs.openvinotoolkit.org/latest/_docs_Workbench_DG_Dataset_Types.html

Attached are the webinar slides used in this webinar.