Artificial Intelligence (AI)
Discuss current events in AI and technological innovations with Intel® employees
592 Discussions

Intel’s Flexible AI App Development: Models, Optimizations, and Runtimes

Javier_M_Intel
Employee
0 0 1,654

Building an AI application requires a long chain of choices—development with Intel enables flexibility and adaptability in choosing the right models, optimizations, and runtime frameworks. Four popular model zoos are ONNX, OpenVINO, PyTorch, and Hugging Face, and each of these model zoos contains quality pre-trained models with many options tuned specifically for Intel platforms. Models are further transformed and customized to a developer’s needs with optimizations, and tools like Intel Neural Compressor (INC), Optimum Intel, Intel Extension for PyTorch (IPEX), and Neural Network Compression Framework (NNCF) all work on Intel platforms. Finally, there is the crucial decision of which runtime to use, with popular runtime framework options like OpenVINO or ONNX Runtime (ORT).

All these options are available for use in AI PC development on the Intel® Core™ Ultra processor, allowing developers to choose their preferred models and software. At each critical junction of decision-making, from model selection to optimization to execution, Intel platforms provide the flexibility developers need to make optimal choices.

Model Selection from Model Zoos

 

Javier_M_Intel_0-1716222004626.png

Targeting Intel Core Ultra is straightforward: models are sourced as needed from these four model zoos and imported into the runtime framework. Both ONNX Runtime or OpenVINO frameworks can directly import models from these four repositories. The table below gives a rough idea of the number of models accessible when working with Intel:

Model Zoo

Core Ultra Optimized Models

Total Models

ONNX

167

2325

PyTorch

30

62

Hugging Face

75

584,690

OpenVINO

240

264

 

Model Optimization Tools

After model selection, the next choice in the chain of AI application development is model optimization. Optimizations can significantly change a model—for example, it can reduce the disk footprint for faster load or lower latency while achieving better performance. The rapid pace of model development limits the number of models in model zoos optimized out-of-the-box for Intel Core Ultra. For this reason, developers can optimize existing models using any of the following Intel-developed open-source tools:

  • Intel Neural Compressor (INC): a stand-alone tool that performs various optimizations and quantization with a few lines of code. It also has options for model pruning during training or post-training quantization to an existing model.
  • Optimum Intel: a convenient package of INC with Hugging Face, bringing the power of INC to the largest model repository.
  • Intel Extension for PyTorch (IPEX): the most up-to-date feature optimization available as a single tool and is on the most popular training framework.
  • Neural Network Compression Framework (NNCF): a component of the OpenVINO toolkit that optimizes through weight compression, post-training quantization, and train-time optimization. It works with the four popular model types and integrates with Hugging Face.

Runtime Choices 

The last choice in AI application development is runtime selection. ONNX Runtime (ORT) and OpenVINO are popular runtime frameworks compatible with the four popular model types. They can execute models on the PC client devices, such as Intel Core Ultra processors that combine CPU, GPU, and NPU.

Developing AI applications on Intel platforms makes the long chain of choices to create an application smooth and streamlined. Choose from models that are high quality for accuracy and throughput, customize them using preferred optimization tools, and select the runtime framework that makes the most sense for your specific needs. Flexibility affords peace of mind to focus on developing smarter AI applications and getting them to deployment faster. To learn more, check out AI PC with the Intel Core Ultra Processor and explore the tools for development.

Tags (4)
About the Author
Javier Martinez is a Principal Engineer and AI architect working on the NPU driver stack and heterogeneous computing optimizations for Client PCs.