- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying the Intel DAAL example from The Parallel Universe, issue 28 (https://swdevtoolsmag.makebettercode.com/). The example start on page 26, using with the Kaggle Leaf Classification data set.
I set up the data using the reference provided in the essay https://www.kaggle.com/jeffd23/10-classifier-showdown-in-scikit-learn
After reaching the DAAL line of code:
trainAlg.input.set(classifier.training.data, X_train)
the program crashed with throwing the error:
Traceback (most recent call last):
File "zz_kaggle_leaf_01.py", line 42, in <module>
trainAlg.input.set(classifier.training.data, X_train)
File "/home/intel/intelpython3/lib/python3.5/site-packages/daal/algorithms/classifier/training.py", line 175, in set
return _training3.Input_set(self, id, value)
TypeError: in method 'Input_set', argument 3 of type 'daal::data_management::NumericTablePtr const &'
The piece of code I am using is as follows:
# This part from # https://www.kaggle.com/jeffd23/10-classifier-showdown-in-scikit-learn import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from sklearn.preprocessing import LabelEncoder #from sklearn.cross_validation import StratifiedShuffleSplit from sklearn.model_selection import StratifiedShuffleSplit train = pd.read_csv('./Kaggle_data/Leaf/train.csv') test = pd.read_csv('./Kaggle_data/Leaf/test.csv') # #Data Preparation # Swiss army knife function to organize the data def encode(train, test): le = LabelEncoder().fit(train.species) labels = le.transform(train.species) # encode species strings classes = list(le.classes_) # save column names for submi ssion test_ids = test.id # save test ids for submissio n train = train.drop(['species', 'id'], axis=1) test = test.drop(['id'], axis=1) return train, labels, test, test_ids, classes train, labels, test, test_ids, classes = encode(train, test) print(train.head(1)) #sss = StratifiedShuffleSplit(labels, 10, test_size=0.2, random_state=23) sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=23) #for train_index, test_index in sss: for train_index, test_index in sss.split(train.values, labels): X_train, X_test = train.values[train_index], train.values[test_index] y_train, y_test = labels[train_index], labels[test_index] # Here start # The Parallel Universe, issue 28. The exampe start on page 26 from sklearn.discriminant_analysis import LinearDiscriminantAnalysis favorite_clf = LinearDiscriminantAnalysis() favorite_clf.fit(X_train, y_train) test_predictions = favorite_clf.predict(X_test) #test_predictions = favorite_clf.predict_proba(X_test) # WORK end of INTEL_leaf_ knn_01.py input("\t pg 28 passed with warnings. Hit return") # pg 30 #And KNN in Python (scikit-learn): from sklearn.neighbors import KNeighborsClassifier #favorite_clf = KNeighborsClassifier(k=4) favorite_clf = KNeighborsClassifier(4) favorite_clf.fit(X_train, y_train) test_predictions = favorite_clf.predict(X_test) input("\t Pg 30, KNN scikit-learn, passed Hit return") # pg 30 # KNN training stage in Python (Intel DAAL): from daal.algorithms.kdtree_knn_classification import training, prediction from daal.algorithms import classifier, kdtree_knn_classification trainAlg = kdtree_knn_classification.training.Batch() trainAlg.input.set(classifier.training.data, X_train)
Regards,
Sergio
Enhance your #MachineLearning and #BigData skills via #Python #SciPy
1) https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy-video
2) https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-computing-second-edition
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergio,
Thanks for raising this question. PyDAAL algorithms operate on NumericTable data structures instead of directly on numpy arrays. You can do the appropriate conversions as follows.
from daal.data_management import HomogenNumericTable from daal.algorithms.kdtree_knn_classification import training, prediction from daal.algorithms import classifier, kdtree_knn_classification trainAlg = kdtree_knn_classification.training.Batch() trainAlg.input.set(classifier.training.data, HomogenNumericTable(X_train)) trainAlg.input.set(classifier.training.labels, HomogenNumericTable(np.array(y_train.reshape(792, 1), dtype=np.intc), ntype=np.intc)) trainAlg.parameter.k = 4 trainingResult = trainAlg.compute()
The prediction algorithm will also require a HomogenNumericTable as input.
predictAlg = kdtree_knn_classification.prediction.Batch() predictAlg.input.setTable(classifier.prediction.data, HomogenNumericTable(X_test)) predictAlg.input.setModel(classifier.prediction.model, trainingResult.get(classifier.training.model)) predictAlg.compute() predictionResult = predictAlg.getResult() test_predictions = predictionResult.get(classifier.prediction.prediction)
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Christopher,
Your clarification makes the code pass that stage. Surely it might help to continue with the rest of the essay..
Ragards,
Sergio
Enhance your #MachineLearning and #BigData skills via #Python #SciPy
1) https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy-video
2) https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-computing-second-edition
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chris,
As expected I was able to finish this example. Now, how can I get to print the values returned in
trainingResult = trainAlg.compute()
and in
test_predictions = predictionResult.get(classifier.prediction.prediction)
straightforward printing gives
In [23]: print(trainingResult) ...: <daal.algorithms.kdtree_knn_classification.training.Result; proxy of <Swig Object of type 'daal::services::SharedPtr< daal::algorithms::kdtree_knn_classification::training::interface1::Result > *' at 0x7f39f53a4090> > In [24]: print(test_predictions) ...: <daal.data_management.NumericTable; proxy of <Swig Object of type 'daal::services::SharedPtr< daal::data_management::interface1::NumericTable > *' at 0x7f39a7627f60> >
Salut,
Sergio
Enhance your #MachineLearning and #BigData skills via #Python #SciPy
1) https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy-video
2) https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-computing-second-edition
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergio,
To print a NumericTable, you can use the utility functions found in <daalroot>/examples/python/source/utils.
import os import sys from os.path import join as jp sys.path.insert(0, jp(os.environ['DAALROOT'], 'examples', 'python', 'source')) from utils import printNumericTable printNumericTable(test_predictions)
There is a complete list of PyDAAL examples at https://software.intel.com/en-us/node/682181. You may also find the tutorials at https://github.com/daaltces/pydaal-tutorials useful.
You should be able to run the remaining algorithms from the essay by using the HomogenNumericTable constructor on any inputs that are numpy arrays. The only caveat is that the only integer type accepted by NumericTables is numpy.intc, so numpy.int64 arrays will need to be converted. The other thing to pay attention to is array shape. You may need to transpose one here or there. In the meantime, I will see if it's possible to publish the complete code samples somewhere. Let me know if you run into any problems.
Thanks,
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Chris,
I am getting the error:
from utils import printNumericTable
ImportError: No module named 'utils'
Before running the code I did: export DAALROOT=/home/intel/intelpython3/pkgs/pydaal-2018.0.0b20170313-py35_intel_0
I also tried (obviously NO success):
$ conda install utils Fetching package metadata ......... Solving package specifications: . PackageNotFoundError: Package not found: '' Package missing in current linux-64 channels: - utils Close matches found; did you mean one of these? utils: xlutils, docutils, psutil You can search for packages on anaconda.org with anaconda search -t conda utils You may need to install the anaconda-client command line client with conda install anaconda-client
By the way, I was able to successfully run the examples at pydaal-2018.0.0b20170313-py35_intel_0/sh
are/pydaal_examples/examples/python via the command python run_examples.py
Regards,
Sergio
Enhance your #MachineLearning and #BigData skills via #Python #SciPy
1) https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy-video
2) https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-computing-second-edition
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergio,
I was incorrectly assuming a separate DAAL installation. Sorry about that. For Intel Python, the utils folder can be found in <install_root>/share/pydaal_examples/examples/python/source. Just make sure that folder is on your sys.path at runtime, or you can point the PYTHONPATH environment variable there.
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK. Its working. Now what remains is to make sense of output. It does not look any close to the sklearn one.
Sergio
Enhance your #MachineLearning and #BigData skills via #Python #SciPy
1) https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy-video
2) https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-computing-second-edition

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page