Artificial Intelligence (AI)
Discuss current events in AI and technological innovations with Intel® employees
535 Discussions

Intel Gives scikit-learn* the Performance Boost Data Scientists Need

MaryT_Intel
Employee
0 0 1,368

Key Takeaways

  • The Intel optimizations for Scikit-Learn, made available through Intel® oneAPI AI Toolkit, reduce algorithm run times and gives data scientists time back to focus on their problem-solving models.

Scikit-Learn* is one of the most widely used Python* packages for data science and machine learning. Scikit-Learn accelerators can analyze machine learning data across many industry use cases while driving efficient use of hardware compute resources. The Intel optimizations for scikit-learn, made available through Intel® oneAPI AI Toolkit, reduce algorithm run times and gives data scientists time back to focus on their problem-solving models. Intel has invested in optimizing performance of Python* itself, with the Intel® Distribution of Python, and has optimized key data science libraries used with scikit-learn, such as XGBoost, NumPy, and SciPy.

In a recent benchmark, Intel engineers analyzed how Intel-optimized scikit-learn performs on the 2nd Generation Intel® Xeon® Scalable processors compared to AMD datacenter processors and Nvidia GPUs.  Using Intel performance as the baseline, shown as the solid blue line at 1.00, we consistently saw that the Intel-optimized scikit-learn library algorithms outperformed the same algorithms run on the AMD EPYC 7742 processor, shown in orange. A key Intel performance increase comes from using Intel® Advanced Vector Extensions (Intel® AVX512 instructions), unavailable on AMD processors.  Additionally, we saw the Intel-optimized scikit-learn* library consistently outperformed the Nvidia V100 GPU implementation.

scikit

The Intel CPU performance, compared to the Nvidia GPU performance (shown in purple), highlights the inherent performance advantages at the processor core level.

Install the Intel® oneAPI AI Toolkit

Gain the advantage of Intel® Xeon® Scalable processor performance speedup by downloading Intel® oneAPI AI Toolkit that includes Intel® Distribution for Python with all of Intel’s scikit-learn optimizations. The toolkit is distributed through many common channels, including from Intel’s website, YUM, APT, Anaconda, and more.  Select and download the distribution package that’s best suited for you and follow the Get Started guide for post-installation instructions.

Alternately, if you’re already using Anaconda, you can also get the latest Intel scikit-learn optimizations by following these instructions. First update your Anaconda distribution and add the Intel Anaconda channel to your conda config file:

            conda update conda
            conda config –add channels intel

Second, install the Intel optimized scikit-learn from the Intel Anaconda channel:

            conda install – c intel scikit-learn

Give Intel-optimized scikit-learn a Try

Once installed and set up properly, you can accelerate the scikit-learn applications programmatically:

1. Load the daal4py module from the Python command line:

            python –m daal4py your_application.py

2. While using the command-line is fine for testing and experimentation, you can also patch scikit-learn inside of your Python program before importing any other scikit-learn modules with the following lines:

            import daa4py.sklearn
            daal4py.sklearn.patch_sklearn()

3. When it’s successfully patched, the conda console will show this informal message when running Python scikit-learn code. It includes a link to the daal4py optimized library documentation where you can also read how to enable all scikit-learn optimizations:

             import daal4py.sklearn

             daal4py.sklearn.patch_sklearn()
            Intel(R) oneAPI Data Analytics Library solvers for sklearn enabled:
            https://intelpython.github.io/daal4py/sklearn.html

Many data scientists can spend hours and even days waiting for algorithms to run and process data.  After an initial analysis, they may re-run the data analysis with different parameters, looking for a more optimal model or better accuracy. Faster computer processing time means more time analyzing the data, tweaking and improving models, and solving the underlying problem.  Intel® oneAPI AI Toolkit optimizations for scikit-learn on 2nd Generation of Intel® Xeon® Scalable processors are key to helping data scientists do just that.

 

Resources

About the Author
Mary is the Community Manager for this site. She likes to bike, and do college and career coaching for high school students in her spare time.