Intel® Distribution for Python*
Engage in discussions with community peers related to Python* applications and core computational packages.

How to get befenit from Python (IDP) acceleration?

Flavio_Luis_de_Mello
953 Views

Hello,

My python script is not being accelerated at all.

I have installed Intel Distribution for Python (IDP) using l_python2_pu3_2017.3.053 obtained at Intel® Distribution for Python* . I ran "./install_gui.sh" and I got no error messages.  Then I have loaded Intel environment by typing "source /opt/intel/intelpyhon3/bin/activate". I call "python test.py" and the script runs with Intel Python. My problem is that I cannot see any improvement when using IDP, compared to standard python from Ubuntu. There must be something wrong with my environment.

This is the information about my system:
Hardware:  Intel® Core™ i7 4500U@1.8 GHz processor, 64 bits, 8GB RAM
Software:  Ubuntu 16.04 LTS operating system, Intel® distribution for Python* 2.7.13, standard Python* 2.7.12, Scikit-learn 0.19.0

My code is simple and uses scikit-learn library. Since scikit-learn used scipy and numpy that are supported by Intel, I had supposed its performance would be improved too. But it is not what happened. My code is simple:


#########################

import numpy as np
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn import svm
import time
import sys

input_file = "drill39mil.csv"

# Read data
mydata = pd.read_csv(input_file, header = 0, delimiter = ",")

# Break data into train and test dataset
train_mydata, test_mydata = train_test_split(mydata, test_size = 0.2)

# Provided your csv has header row, and the label column is named "classe"
train_data_target = train_mydata["classe"]
test_data_target = test_mydata["classe"]

# Select all but last column as data, which is the classification class
train_data = train_mydata.iloc[:,:-1]
test_data = test_mydata.iloc[:,:-1]

start = time.time()
#######       Classifier
clf = svm.SVC()

#Perform training
clf.fit(train_data, train_data_target)

# Make class predictions for all observations
Z = clf.predict(test_data)

# Compare predicted class labels with actual class labels
accuracy=clf.score(test_data,test_data_target)
print ("Predicted model accuracy: "+ str(accuracy))

end = time.time()
print("Time (s):" + str(end - start))
print (sys.version)
#######################


The outputs are when running Intel Python are:
Predicted model accuracy: 0.543351131452
Time (s):628.276842833
2.7.13 |Intel Corporation| (default, Apr 27 2017, 15:33:46)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]

And when running standard Python from Ubuntu are:
Predicted model accuracy: 0.550597508263
Time (s):589.650998831
2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609]

As you see, Intel Python is slower that standard Python. Can anyone give me a tip of what is going wrong?

Thanks in advance,

Flávio Mello

0 Kudos
1 Solution
DavidLiu
Employee
953 Views

Hi Flavio,

After talking with engineering, scikit-learn's svm.SVC() doesn't use NumPy (which we accelerate) and doesn't use our Intel® DAAL library, but instead uses libsvm which we haven't currently accelerated at this time.  So for now, it is the same speed as standard version of Scikit-learn as it doesn't leverage any of our current accelerations.  But in many of the other areas of scikit-learn we do have significant accelerations, and we are looking to add more this year (such as svm.SVC() ).  Thanks for bringing this to our attention, and please let us know if you have any further questions.

Thanks,

David

View solution in original post

0 Kudos
4 Replies
DavidLiu
Employee
953 Views

Hi Flavio,

What are the dimensions of the "drill39mil.csv" file?  How many features?

If your dataset is oddly sized or not big enough to cause proper vectorization, the the cost of setting it up may (at the small scale) make it slower than the standard variant of those packages.  Just want to make sure it isn't that problem before we look at other parts of your environment and setup.

Thanks!

-David

0 Kudos
Flavio_Luis_de_Mello
953 Views

Hi David,

Thank you for your attention.

The dataset is composed of 21 features (columns) and 39329 records (lines)

Thanks,

Flávio Mello

0 Kudos
DavidLiu
Employee
954 Views

Hi Flavio,

After talking with engineering, scikit-learn's svm.SVC() doesn't use NumPy (which we accelerate) and doesn't use our Intel® DAAL library, but instead uses libsvm which we haven't currently accelerated at this time.  So for now, it is the same speed as standard version of Scikit-learn as it doesn't leverage any of our current accelerations.  But in many of the other areas of scikit-learn we do have significant accelerations, and we are looking to add more this year (such as svm.SVC() ).  Thanks for bringing this to our attention, and please let us know if you have any further questions.

Thanks,

David

0 Kudos
Flavio_Luis_de_Mello
953 Views

Hi David,

Thank you for the information.

Cheers,

Flávio Mello

0 Kudos
Reply