How to Speed up your Machine Learning Workloads with Intel® Extension for Scikit-Learn in AutoGluon

Deb_Bharadwaj · ‎03-16-2023

Co-Authors:

Rachel Oberman, Intel

Dr. Deb Bharadwaj, Intel

Wenming Wye, AWS

Introduction:

There has been an acceleration in the number of tools developed across the industry to help create and deploy machine learning models. However, if the learning curve or pricing to build and integrate these optimizations and tools into business processes is too steep, ML engineers and scientists are far less likely to adopt new tools and frameworks. This creates a paradox of faster advancement in features actually leading to more barriers for developer adoption.

Intel and AWS differentiate themselves by focusing on easy-to-implement machine learning solutions that do not require significant steps to implement. This is demonstrated through packaging of best practices into a simple, easy-to-use software that any developer can use. These tools strive to automatically prepares datasets, try different machine learning approaches, and combine their results.

One such library is AWS's popular opensource autoML library, AutoGluon. It promises to deliver high-quality models in just a few lines of code by automating most of the best practices in machine learning. After Amazon’s internal Machine Learning University trained nearly 25,000 engineers and technical staff over the past 3 years, AutoGluon is now the de facto AutoML system at Amazon and AWS. It powers Amazon SageMaker AutoPilot as a managed autoML service, and hundreds of internal projects at Amazon. AutoGluon’s philosophy is to build a higher level of abstraction on top of a number of ML and Deep Learning frameworks such as XGBoost*, LightGBM*, and PyTorch*, making it flexible to solve tabular and multimodality problems, and transparently reaps the benefit of innovations from these frameworks and optimizations. More importantly, developers and data scientists love AutoGluon because it cuts model building time by at least 70%, saving them time and repetitive model building work. A user can typically see AutoGluon getting on a Kaggle leaderboard within just 4 hours with default parameters.

The Need:

As the importance and significance of data and AI in modern-day industries continues to rise, the challenges and costs faced when running these workloads continue to increase. In addition, many data scientists and AI developers face the additional burden of tweaking, re-tweaking, and changing their workload while trying to determine the best model for their use case based on their specific needs. These problems cause an increase in the amount of cost, effort, and time it takes for data scientists to properly develop and run their AI workflows, when they can be using those resources to instead improve and refine their pipelines.

“We are now able to address a range of the world's most significant problems with AI, but if it is too significant of a cost or too cumbersome to make, no one will embrace AI in their solutions. This is what makes integrating accelerations such as Intel® Extension for Scikit-Learn* into automated machine learning tools such as AWS AutoGluon so significant: it provides an open-source, combined solution for developers to significantly decrease the time it takes to get high-quality, tuned models which leads to greater AI adoption.”

– Wei Li, VP, General Manager for AI and Analytics for Intel

To make it easier for data scientists and developers to take advantage of these open solutions, Intel and AWS have decided to partner together to help tackle these resource problems head-on by combining their existing tools together: through the integration of Intel® Extension for Scikit-Learn* into Amazon Web Services’ AutoGluon library for Tabular Prediction.

A Deeper Technical Background on the Tools:

Here is a small summary on each of these tools, if you are less familiar with them:

AWS AutoGluon

AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to:

Quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code.
Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge.
Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing.
Easily improve/tune your bespoke models and data pipelines, or customize AutoGluon for your use-case.

Intel® Extension for Scikit-Learn* is a simple drop-in acceleration for the popular Scikit-Learn* machine learning library that allows developers to seamlessly scale scikit-learn applications for Intel® architecture with up to 100x+ performance gain and possibilities of improved accuracy on their existing code. This allows data scientists to continue to use the popular Scikit-Learn* library in their workloads, but also take more advantage of their hardware through a few simple steps in order to reduce costs and time bottlenecks.

Combined, machine learning developers experience powerful speedup in their workloads in an open ecosystem:

With this integration, AutoGluon, and by extension, Amazon SageMaker users will be able to enable Intel® Extension for Scikit-Learn optimizations for their auto-machine learning pipelines on tabular data with just one extra command at installation time – no additional steps required!
By installing Intel® Extension for Scikit-Learn* within their AutoGluon enrollment, developers will be able to run their AutoGluon workloads with additional speed-up for training and inference for Linear Regression and K-Nearest Neighbors Classification algorithms on Intel® architecture without any code changes, leading to reduced TCO and more time to analyze and improve their machine learning workloads.

Here’s everything you need to know to get started:

To install AutoGluon with Intel® Extension for Scikit-Learn*, first install the AutoGluon as normal by using the following the directions found on the AutoGluon Installation documentation with your preferences. Afterward, you can install and enable Intel® Extension for Scikit-Learn* by using the following command:

pip install autogluon.tabular[all,skex]

More information can be found in the AutoGluon installation documentation here.

Once Intel® Extension for Scikit-Learn* is installed and enabled, your AutoGluon code will be enabled by Intel® Extension for Scikit-Learn* where possible – no extra steps required!

Many Tabular ML users are still using CPUs today and through this collaboration, this enhances the ability to modernize machine learning for tabular workloads such as XGBoost, and lightGBM. Moreover, these users can effortlessly expand into deep learning via AutoGluon multimodality, by adding text, and image models. With the power of Intel oneAPI, users can transparently unlock their Intel hardware acceleration from PC to the cloud.

After Intel® Extension for Scikit-Learn* is installed, the analytics and machine learning optimizations from the package will be turned on by default. This can also be verified when calling the “fit” command in AutoGluon for Tabular Data Prediction, which will output the warning “Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)” if the optimizations have been used to optimize a supported algorithm within AutoGluon, such as Linear Regression of K-Nearest Neighbors Classification.

Following this installation, we encourage you to get started with using Intel® Extension for Scikit-Learn* with AutoGluon by following the Tabular Prediction quick start code sample here to start experiencing additional speed-up in your AutoGluon workloads!

To learn more about AWS AutoGluon and Intel® Extension for Scikit-Learn*, please visit:
Intel AI ML Portfolio
Intel AI ML Developer Ecosystem
Intel® Extension for Scikit-Learn* Website
[Support] AWS AutoGluon GitHub Issues

intel.web.448.336.png