An Inference API for the Intel® Tiber™ Developer Cloud: Developer Spotlight

Nikita_Shiledarbaxi · ‎05-30-2024

Success Story of an Intel® Student Ambassador

An Intel® Student Ambassador, Kieran Llarena, designed a dedicated Inference API for Intel® Tiber™ Developer Cloud to train the AI models developed on the cloud platform.

About the Student Ambassador

Kieran Llarena is currently a Computer Science student at the University of Michigan-Dearborn. As an Intel Student Ambassador, he finds that it is easy to work with the Intel Tiber Developer Cloud platform, and the program manager is very professional. Through the Student Ambassador program, he has been able to easily access the cloud platform and hence, learn about and host various AI models on it.

Solution Approach

Being interested in backend engineering, the Student Ambassador utilized his Python* and networking knowledge to develop an API proxy that allows users to inference their trained models on the Intel Tiber Developer Cloud using the FastAPI and Fabric libraries. The API proxy is highly modular and can easily be configured to work with any machine instance.

Tools and Technologies Used

Figure 1

The client first makes an HTTP POST request with a query in the request body to the API proxy. When the request hits the API, the proxy SSHs into the Intel Tiber Developer Cloud machine instance and runs the inference.py file with the query that the user passed to the proxy. Both are done with the Fabric Python library.

The SSH key that is used to SSH into the cloud instance from the proxy is loaded locally from the machine itself.

 Check out the demo video on YouTube.

The cloud machine instance run in the above demo video was a Large VM on 4th Gen Intel® Xeon® Scalable processor with 32 cores, 64GB memory, and 64GB disk.

Figure 2.png Figure 2

Next, the system waits for the cloud machine instance to finish its job (an image is generated in this case).

Figure 3.png Figure 3

When the cloud machine instance finishes its job, the API proxy copies this file with Secure Copy Protocol (SCP) and sends it back to the client in response to the initial POST request that was sent.

Check out the Inference API implementation on GitHub.

Note: The GitHub repository does not include the source code of inference.py file executed on the cloud machine instance.

Next Steps

Going forward, the Student Ambassador plans to figure out how to increase the inference speed of the API. Intel® OpenVINO™ Toolkit may be used to accelerate the image generation process. Additionally, Kieran plans to implement a webhook feature instead of letting the requests hang while the cloud machine instance performs its work.

Sign up to Intel Tiber Developer Cloud today - get started with accelerated development of AI, HPC and edge-computing solutions! Explore the Intel Student Ambassador Program to utilize Intel® oneAPI tools and the Intel Tiber Developer Cloud in collaboration with developer communities.

We encourage you to check out the AI, HPC, and Rendering tools in Intel’s oneAPI-powered software portfolio.

Notices and Disclaimers

In the demo video, an unoptimized stable diffusion model has been used. So, the speed of the inference is not accurate and can be tuned to be faster. Additionally, a slow internet speed could affect the usage of the inference API.

Performance varies by use, configuration, and other factors. Learn more at www.Intel.com/PerformanceIndex .Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.
*Other names and brands may be claimed as the property of others.