Migrating Your AI Solutions to Intel

Krishnakumar_VP_Intel · ‎06-23-2024

To fully grasp the steps, it is essential to journey back to the era before AI's dominance and chart the course of change that has led us to the present. This exploration will reveal the elements of programming, the aspects that have undergone significant modification, and the driving forces behind these changes. We will uncover how Intel's open approach with OpenVINO is reshaping the AI solution landscape, offering seamless transitions and democratizing AI development.

Programming before AI and what changed

The Run was always executed on CPU. The runtime did the conversion to machine code that was executed on the CPU. GPU, the special purpose processor for graphics rendering, with a large parallel processing capability architecture did excellent fixed graphics rendering functions.

GPU for General Purpose Programming

Repurposing GPUs for general-purpose programming unlocked fast computations for large parallel computations like matrix multiplications. Data science researchers' adoption of GPUs for general-purpose programming fueled further research and tool development in GPUs.

Unlike the SW toolchain for CPU programming, the GPU toolchains grew to be proprietary and mostly vendor locked. Open standards emerged and are maturing, but proprietary libraries and tools could use GPUs better, and the performance gap was substantial.

Now the Run became more complex. There is no longer just one runtime executing machine code for a processor. An additional runtime is required to run GPU code and both runtimes need to exchange data.

AI Era

During model training, AI model is created using neural networks and large datasets.

Training requires significant computational resources and specialized libraries. It requires forward passes, backward passes, and weight updates. Thus, it is very computation-intensive and will benefit extensively from parallel processing.

In inference, the trained model is used for making predictions on new data.

Inference, on the other hand, requires much lesser (generally less than half) of parallel matrix multiplications.

The labels and weights in AI modeling require expansive parallel programming, especially matrix multiplications. Emerging AI frameworks and libraries leveraged GPUs' excellent parallel computation abilities. The proprietary GPU vendor libraries and tools could perform better compared to the open libs and tools. With the exponential growth of AI, the libs and tools for using GPUs expanded, but maintained a closed nature, unlike CPU programming.

More options promoting an open approach to AI development are being made available, which gives more power to solution builders by democratizing AI.

To Use or Not Use GPU: The python interpreter itself runs on CPU. The heavy parallel operation workloads (like a complex AI model) can be delegated to GPU using a GPU library. This involves data transfer from system memory to GPU and back. So, the parallel execution advantage must be much more than data transfer latencies, to be of effective use. For a small to medium AI model, end to end solution performance could be better without transfer to GPU. But large AI models & computations will surely benefit from GPU parallel execution.

Integrated GPU: When using the integrated GPU (iGPU) in Intel core processors, this data transfer latency is minimized as the same memory is used.

What AI Solution components need modification to Move to Intel

Intel takes an open approach to AI to ensure easy portability and ease of adoption, while trying to ensure best use of heterogeneous HW resources in the system (CPU, GPU, NPU).

There is no need to purchase a development kit. AI will work on the CPU, integrated GPU, and NPU already present in your current Intel system.

The following is written based on these assumptions.

An existing AI solution is available as a demo / POC / product quality; if not, an existing pre-trained AI model will be used.
- The easy way to obtain a model is to download it from an online database, such as Kaggle, Hugging Face, or Torchvision models.
Python is the programming language used.

Mainly, the model handling part is what needs to change. Model loading, running will be different when moving from Non-Intel to Intel platform.

How to Change what needs to change

How to change the AI model handling part of the solution

“Write once, deploy anywhere" approach. Execute across various Intel architectures (CPU, GPU, NPU, FPGA) and the ARM CPU architecture.

The OpenVINO toolkit has a ‘Model converter’ that converts your existing AI model to the format the OpenVINO runtime can use.

It is easy to convert using a command line tool or Python API, and all major frameworks are supported. Once the model is in Openvino format (with .bin and .xml files), the loading, inferencing steps in your solution can be modified to use OpenVINO’s loading, inferencing APIs.

OpenVINO Runtime is a set of C++ libraries that provides a common API for deploying inference on the platform of your choice. You can run any of the supported model formats directly or convert the model and save it to the OpenVINO IR format, for maximum performance.

# Initialize OpenVINO Runtime Core
  core = ov.Core() 
# Read a model
  model = core.read_model(model_path)
# Loading model to the device
  compiled_model = core.compile_model(model, device_name)
# Create infer request and do inference synchronously
  results = compiled_model.infer_new_request(<inputs:>)

OpenVINO™ Model Server may be used for scaled deployment

Model Server hosts models and makes them accessible to software components over standard network protocols: a client sends a request to the model server, which performs model inference and sends a response back to the client.

Light weight Client apps independent of Model Frameworks
Supports REST and gRPC.

DL Streamer an open-source streaming media analytics framework for creating complex media analytics pipelines. Intel® DL Streamer uses OpenVINO™ Runtime inference back-end.

Better performance, less code
Quickly develop, optimize, benchmark, and deploy

Get Moving!

OpenVINO Documentation

https://docs.openvino.ai/

Interactive Tutorials - Jupyter Notebooks

Start with interactive Python that show the basics of model inferencing, the OpenVINO API, how to convert models to OpenVINO format, and more.

OpenVINO Code Samples

View sample code for various C++ and Python applications that can be used as a starting point for your own application.

OpenVINO Success Stories - See how Intel partners have successfully used OpenVINO in production applications to solve real-world problems.

Performance Benchmarks - View results from benchmarking models with OpenVINO on Intel hardware.

Intel Community

https://community.intel.com

EWalkup · ‎07-08-2024

Having a non-technical background, I found this article very insightful in understanding the benefits of implementing an Intel open-source AI model. Incorporating OpenVINO throughout the model design and understanding the performance + cost trade-offs between GPU and CPU inferencing provides our ecosystem of customers and partners an entire suite of modularization and cost benefits for end solutions.