- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I signed up for the free resources in devcloud. I want to see if I can get faster run of my sklearn based ml script. If I run my script in Jupyter as provided in devcloud (24 cores and 192 GB RAM) will it use the resources more efficiently or do I have to config it somehow?
I am running it now as is. I didn't have to install sklearn, pandas or numpy. But there is no env (conda or venv)
I had used intel python before and when some code used intel's pandas or numpy or sklearn it would print a message that DAAL is used or similar. The newest versions of intel python distros don't seem to do that. Am I missing something?
Is it that intel python distro accelerates the following only from sklearn:
https://intelpython.github.io/daal4py/sklearn.html#daal-accelerated-scikit-learn
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Georgios,
When you activate the "USE_DAAL4PY_SKLEARN" environment variable, it will run the accelerated SciKit-Learn algorithms on the Intel® DAAL Python accelerated SciKit-Learn monkey-patch list underneath your normal SciKit-Learn code. Unfortunately, the Decision Tree Regressor and AdaBoost Regressor algorithms are not on this list, which means that you are running the normal scikit-learn versions of these algorithms, with no accelerations running beneath them. This is why you do not see a performance difference.
If you would like to continue using normal SciKit-Learn API to benchmark the Intel® DAAL accelerated SciKit-Learn algorithms, please consider using one or more of the algorithms found on the monkey-path list I linked above.
Alternatively, you can directly use daal4py (another name for Intel® DAAL Python), which has a friendly API that is similar to normal SciKit-Learn's API (link to daal4py documentation). Using daal4py will give you access to more Intel® DAAL accelerated algorithms that are not listed on that monkey-patch list. Decision Tree Regressor is available directly on daal4py, but Ada Boost Regressor is not yet available. Please consider this as an alternative to help you with your benchmarking.
Thanks,
Rachel
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Georgios,
In devcloud the default python being used is intel-python which comes as a part of the oneAPI toolkit.The Intel® Distribution for Python* ships with many specialized packages that offer accelerated workflows and advanced functionality like sklearn, pandas , numpy etc. So using this python distribution would by itself accelerate your workloads. You can refer to the below link for more details
Coming to your next query why there is no env. Devcloud has a base environment by default. The default environment in devcloud is oneAPI base environment you can verify this by the below command.
conda env list
you would see a '*' symbol next to the base environment which shows it is activated. You can also create your own conda or virtual environments in devcloud
The intel python accelerates the scikit-learn available available with it using Intel® MKL, Intel® DAAL, and Intel® Thread Building Blocks through direct source code changes to the package. So it is not only the scikit learn available with daal that is being accelerated but also the default scikit learn made available with intel distribution of python which can be used without any API changes is also accelerated in the back end .
Thanks
Arun
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Arun. Thanks for the reply. My next question is if there is some piece of code with sklearn that you use as benchmark for the default python distro (from anaconda ?) and the intel one. I had ran something locally (MacBook Pro i7 6 cores, 16GB RAM and SSD) in a conda env with intelpython3_full and another env that i created with anaconda default (with pandas, numpy sklearn). I didn't find any difference in times. I also added then this:
import os # Set environment variables os.environ['USE_DAAL4PY_SKLEARN'] = 'YES'
In the local intelpython3_full env I got this:
Intel(R) Data Analytics Acceleration Library (Intel(R) DAAL) solvers for sklearn enabled: https://intelpython.github.io/daal4py/sklearn.html
which confused because I thought it would natively use the intel optimization libraries because it is intel distro.
That is why I am asking for some help to evaluate intel distro properly for my resources. If you have some standard procedure for benchmarking then please share it with me.
The main code I used as benchmark is (Please ignore the plotting part as it is not included in the timing bit):
import numpy as np import matplotlib.pyplot as plt from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import AdaBoostRegressor import time start_time = time.time() # Create the dataset rng = np.random.RandomState(1) X = np.linspace(0, 6, 100000)[:, np.newaxis] y = np.sin(X).ravel() + np.sin(6 * X).ravel() + rng.normal(0, 0.1, X.shape[0]) # Fit regression model regr_1 = DecisionTreeRegressor(max_depth=4) regr_2 = AdaBoostRegressor(DecisionTreeRegressor(max_depth=4), n_estimators=300, random_state=rng) regr_1.fit(X, y) regr_2.fit(X, y) print("--- %s seconds ---" % (time.time() - start_time)) # Predict y_1 = regr_1.predict(X) y_2 = regr_2.predict(X) # Plot the results plt.figure() plt.scatter(X, y, c="k", label="training samples") plt.plot(X, y_1, c="g", label="n_estimators=1", linewidth=2) plt.plot(X, y_2, c="r", label="n_estimators=300", linewidth=2) plt.xlabel("data") plt.ylabel("target") plt.title("Boosted Decision Tree Regression") plt.legend() plt.show()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Georgios,
We will check with SME and back to you on this.
Arun Jose
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Georgios,
When you activate the "USE_DAAL4PY_SKLEARN" environment variable, it will run the accelerated SciKit-Learn algorithms on the Intel® DAAL Python accelerated SciKit-Learn monkey-patch list underneath your normal SciKit-Learn code. Unfortunately, the Decision Tree Regressor and AdaBoost Regressor algorithms are not on this list, which means that you are running the normal scikit-learn versions of these algorithms, with no accelerations running beneath them. This is why you do not see a performance difference.
If you would like to continue using normal SciKit-Learn API to benchmark the Intel® DAAL accelerated SciKit-Learn algorithms, please consider using one or more of the algorithms found on the monkey-path list I linked above.
Alternatively, you can directly use daal4py (another name for Intel® DAAL Python), which has a friendly API that is similar to normal SciKit-Learn's API (link to daal4py documentation). Using daal4py will give you access to more Intel® DAAL accelerated algorithms that are not listed on that monkey-patch list. Decision Tree Regressor is available directly on daal4py, but Ada Boost Regressor is not yet available. Please consider this as an alternative to help you with your benchmarking.
Thanks,
Rachel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot. This has been really helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Georgios,
No problem, I'm glad my response was helpful.
Has my response resolved your question so we can we close this thread?
-Rachel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This issue is now closed. If you have a similar question, please start a new thread.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page