To fully grasp the steps, it is essential to journey back to the era before AI's dominance and chart the course of change that has led us to the present. This exploration will reveal the elements of programming, the aspects that have undergone significant modification, and the driving forces behind these changes. We will uncover how Intel's open approach with OpenVINO is reshaping the AI solution landscape, offering seamless transitions and democratizing AI development.
Programming before AI and what changed
The Run was always executed on CPU. The runtime did the conversion to machine code that was executed on the CPU. GPU, the special purpose processor for graphics rendering, with a large parallel processing capability architecture did excellent fixed graphics rendering functions.
GPU for General Purpose Programming
Repurposing GPUs for general-purpose programming unlocked fast computations for large parallel computations like matrix multiplications. Data science researchers' adoption of GPUs for general-purpose programming fueled further research and tool development in GPUs.
Unlike the SW toolchain for CPU programming, the GPU toolchains grew to be proprietary and mostly vendor locked. Open standards emerged and are maturing, but proprietary libraries and tools could use GPUs better, and the performance gap was substantial.
Now the Run became more complex. There is no longer just one runtime executing machine code for a processor. An additional runtime is required to run GPU code and both runtimes need to exchange data.
AI Era
During model training, AI model is created using neural networks and large datasets. Training requires significant computational resources and specialized libraries. It requires forward passes, backward passes, and weight updates. Thus, it is very computation-intensive and will benefit extensively from parallel processing. | In inference, the trained model is used for making predictions on new data. Inference, on the other hand, requires much lesser (generally less than half) of parallel matrix multiplications. |
The labels and weights in AI modeling require expansive parallel programming, especially matrix multiplications. Emerging AI frameworks and libraries leveraged GPUs' excellent parallel computation abilities. The proprietary GPU vendor libraries and tools could perform better compared to the open libs and tools. With the exponential growth of AI, the libs and tools for using GPUs expanded, but maintained a closed nature, unlike CPU programming.
More options promoting an open approach to AI development are being made available, which gives more power to solution builders by democratizing AI.
To Use or Not Use GPU: The python interpreter itself runs on CPU. The heavy parallel operation workloads (like a complex AI model) can be delegated to GPU using a GPU library. This involves data transfer from system memory to GPU and back. So, the parallel execution advantage must be much more than data transfer latencies, to be of effective use. For a small to medium AI model, end to end solution performance could be better without transfer to GPU. But large AI models & computations will surely benefit from GPU parallel execution. |
Integrated GPU: When using the integrated GPU (iGPU) in Intel core processors, this data transfer latency is minimized as the same memory is used. |
What AI Solution components need modification to Move to Intel
Intel takes an open approach to AI to ensure easy portability and ease of adoption, while trying to ensure best use of heterogeneous HW resources in the system (CPU, GPU, NPU).
There is no need to purchase a development kit. AI will work on the CPU, integrated GPU, and NPU already present in your current Intel system.
The following is written based on these assumptions.
- An existing AI solution is available as a demo / POC / product quality; if not, an existing pre-trained AI model will be used.
- The easy way to obtain a model is to download it from an online database, such as Kaggle, Hugging Face, or Torchvision models.
- Python is the programming language used.
Mainly, the model handling part is what needs to change. Model loading, running will be different when moving from Non-Intel to Intel platform.
How to Change what needs to change
How to change the AI model handling part of the solution
“Write once, deploy anywhere" approach. Execute across various Intel architectures (CPU, GPU, NPU, FPGA) and the ARM CPU architecture.
The OpenVINO toolkit has a ‘Model converter’ that converts your existing AI model to the format the OpenVINO runtime can use.
It is easy to convert using a command line tool or Python API, and all major frameworks are supported. Once the model is in Openvino format (with .bin and .xml files), the loading, inferencing steps in your solution can be modified to use OpenVINO’s loading, inferencing APIs.
OpenVINO Runtime is a set of C++ libraries that provides a common API for deploying inference on the platform of your choice. You can run any of the supported model formats directly or convert the model and save it to the OpenVINO IR format, for maximum performance.
# Initialize OpenVINO Runtime Core
core = ov.Core()
# Read a model
model = core.read_model(model_path)
# Loading model to the device
compiled_model = core.compile_model(model, device_name)
# Create infer request and do inference synchronously
results = compiled_model.infer_new_request(<inputs:>)
OpenVINO™ Model Server may be used for scaled deployment
Model Server hosts models and makes them accessible to software components over standard network protocols: a client sends a request to the model server, which performs model inference and sends a response back to the client.
- Light weight Client apps independent of Model Frameworks
- Supports REST and gRPC.
DL Streamer an open-source streaming media analytics framework for creating complex media analytics pipelines. Intel® DL Streamer uses OpenVINO™ Runtime inference back-end.
- Better performance, less code
- Quickly develop, optimize, benchmark, and deploy
Get Moving!
OpenVINO Documentation |
Interactive Tutorials - Jupyter Notebooks
Start with interactive Python that show the basics of model inferencing, the OpenVINO API, how to convert models to OpenVINO format, and more.
OpenVINO Code Samples
View sample code for various C++ and Python applications that can be used as a starting point for your own application.
OpenVINO Success Stories - See how Intel partners have successfully used OpenVINO in production applications to solve real-world problems.
Performance Benchmarks - View results from benchmarking models with OpenVINO on Intel hardware.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.