Introducing: ONNX Format Support for the Intel® Distribution of OpenVINO™ toolkit

MaryT_Intel · ‎09-24-2020

Key Takeaways

Learn how to train models with flexibility of framework choice using ONNX and deploy using the Intel® Distribution of OpenVINO™ toolkit with a new streamlined and integrated path.
Get started quickly by loading ONNX models into the Inference Engine runtime within the Intel® Distribution of OpenVINO™ toolkit.

Developing deep learning models and then deploying them into production typically involves several steps for dataset preparation, model architecture selection, training models, and deployment. Model training is performed within environments that are flexible and experiment-friendly (e.g., frameworks, such as TensorFlow and PyTorch), while the deployment is typically chosen with performance and robustness in mind. Taking a model from development to production requires a change of software framework to achieve the best results in order deploy on the most suitable target hardware. The Intel® Distribution of OpenVINO™ toolkit is an inference framework that unleashes the most optimal performance on Intel-powered hardware with reduced model footprint.

Moving models from the original framework to an inference solution might be complicated due to framework-specific model representation that is used to store models. Within the Intel® Distribution of OpenVINO™ toolkit, we use a component called the Model Optimizer to import models from a framework format to an Intermediate Representation, or IR, (i.e., an internal format within the toolkit). As soon as the model is represented in IR, there is no additional framework-dependent representation; thus, no need to use the framework’s software components in deployment. The runtime component called the Inference Engine within the Intel® Distribution of OpenVINO™ toolkit will then take care of the execution.

While preserving latency, the Intel® Distribution of OpenVINO™ toolkit IR (i.e., links) allows for abstraction from framework specifics and provides additional information, such as preprocessing settings, quantization parameters, hardware-specific execution hints, and low-level model representation. In addition, the IR provides capabilities for size reduction, which are critical for application distribution and deployment especially in edge Inferencing scenarios which have constrained memory and power requirements. We designed the IR to support models that are originating from multiple frameworks. It can represent a wide spectrum of use cases that the Intel® Distribution of OpenVINO™ toolkit supports. The Intel® Distribution of OpenVINO™ toolkit remains platform-agnostic by accepting and producing an IR to the model. The IR representation allows us to innovate quickly and while also introducing additional model and features support..

ONNX (Open Neural Network Exchange) is an evolving model representation industry standard that has been designed with a similar goal in mind—allowing a bridge from development to production and enable representation in a framework agnostically. This way of building tools empowers developers with choice, allowing them to build the way they want. As an example, the path from developing models using the PyTorch framework to inference is typically to first store the model in an ONNX format and then use this file as the input to your inference runtime. Hence, ONNX format support is important for any inference framework. This is why we had collaborated closely with Microsoft and ONNX to integrate ONNX with the Intel® Distribution of OpenVINO™ toolkit.

Previously, the Intel® Distribution of OpenVINO™ toolkit supported the ONNX format fully through the standard Model Optimizer offline path and conversion to IR format. However, starting with the Intel® Distribution of OpenVINO™ toolkit 2020.4 release, we support a more lightweight design process along with fully supporting the ONNX file format as an input to the Intel® Distribution of OpenVINO™ toolkit Inference Engine. In an upcoming release, the support not only includes full precision (i.e., FP32) models, but also includes models that were quantized for INT8 precision using ONNX tools. Internally, the Intel® Distribution of OpenVINO™ toolkit will convert the ONNX model into an Intel® Distribution of OpenVINO™ toolkit native representation and execute models in the standard path. A snippet below shows an example of how ONNX format can be executed.

As you can see, for simple cases there are similarities between ONNX and the Intel® Distribution of OpenVINO™ toolkit IR from the code standpoint. The support is extended not only to CPUs and integrated GPUs, but also to AI accelerators, including Intel® Movidius™ VPU and FPGAs, to meet power, form factor and use-case specific requirements with a write-once-deploy-anywhere simplicity (i.e., using the same codebase and deploying across your choice of target hardware).

To execute this code, you will need to include additional libraries into your distribution (inference_engine_onnx_reader and onnx_importer), and ensure that these libraries are stored with the other Intel® Distribution of OpenVINO™ toolkit libraries (i.e., libraries that are needed to read native ONNX models). It was a design decision to make those libraries optional to optimize for a small runtime size requirement and minimize application distribution for cases when the ONNX format is not used. Moreover, you can remove the library for reading Intel® Distribution of OpenVINO™ toolkit IR (inference_engine_ir_reader) from distribution in this situation.

From an execution perspective, the Intel® Distribution of OpenVINO™ toolkit IR still remains to be a scalable way to achieve richer functionality, such as integrated preprocessing, small footprint and the most optimal performance when tuning specifically for Intel platforms.

For users looking to take full advantage of Intel® Distribution of OpenVINO™ toolkit's performance and features, it is recommended to follow the native workflow of using the Intermediate Representation from the Model Optimizer as input to the Inference Engine.

For users looking to rapidly get up and running with a trained model already in ONNX format (e.g., PyTorch), they are now able to input that ONNX model directly to the Inference Engine to run models on Intel architecture.

Conclusion

Model training is performed within environments that are flexible and experiment-friendly (e.g., frameworks, such as TensorFlow and PyTorch), while the deployment is typically chosen with performance and robustness in mind. To empower developers with choice and flexibility, the Intel® Distribution of OpenVINO™ toolkit now integrates a new ONNX Importer, which enables developers to directly input ONNX model into the Inference Engine to run models on Intel architecture while taking advantage of optimized performance benefits within the Intel® Distribution of OpenVINO™ toolkit.

Get the Intel® Distribution of OpenVINO™ toolkit today and start deploying high-performance, deep learning applications with a write-once-deploy-anywhere efficiency. If you have any ideas in ways we can improve the product, we welcome contributions to the open-sourced OpenVINO™ toolkit. Finally, join the conversation to discuss all things Deep Learning and OpenVINO™ toolkit in our community forum.

_{Notices & Disclaimers}

_{Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.}