Intel® Distribution for Python*
Support and discussions for achieving faster Python* applications and core computational packages.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

DAAL issue: TypeError: in method 'Input_set', argument 3 of type 'daal::data_management:

sergio_r_
Novice
382 Views

 

I am trying the Intel DAAL example from The Parallel Universe, issue 28 (https://swdevtoolsmag.makebettercode.com/). The example start on page 26, using with the Kaggle Leaf Classification data set.

I set up the data using the reference provided in the essay https://www.kaggle.com/jeffd23/10-classifier-showdown-in-scikit-learn

After reaching the DAAL line of code:

trainAlg.input.set(classifier.training.data, X_train)

the program crashed with throwing the error:

Traceback (most recent call last):
  File "zz_kaggle_leaf_01.py", line 42, in <module>
    trainAlg.input.set(classifier.training.data, X_train)
  File "/home/intel/intelpython3/lib/python3.5/site-packages/daal/algorithms/classifier/training.py", line 175, in set
    return _training3.Input_set(self, id, value)
TypeError: in method 'Input_set', argument 3 of type 'daal::data_management::NumericTablePtr const &'

The piece of code I am using is as follows:

# This part from 
# https://www.kaggle.com/jeffd23/10-classifier-showdown-in-scikit-learn

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder
#from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.model_selection import StratifiedShuffleSplit

train = pd.read_csv('./Kaggle_data/Leaf/train.csv')
test = pd.read_csv('./Kaggle_data/Leaf/test.csv')

# 
#Data Preparation

# Swiss army knife function to organize the data

def encode(train, test):
    le = LabelEncoder().fit(train.species) 
    labels = le.transform(train.species)           # encode species strings
    classes = list(le.classes_)                    # save column names for submi
ssion
    test_ids = test.id                             # save test ids for submissio
n
    
    train = train.drop(['species', 'id'], axis=1)  
    test = test.drop(['id'], axis=1)
    
    return train, labels, test, test_ids, classes

train, labels, test, test_ids, classes = encode(train, test)
print(train.head(1))

#sss = StratifiedShuffleSplit(labels, 10, test_size=0.2, random_state=23)
sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=23)

#for train_index, test_index in sss:
for train_index, test_index in sss.split(train.values, labels):
    X_train, X_test = train.values[train_index], train.values[test_index]
    y_train, y_test = labels[train_index], labels[test_index]

# Here start 
# The Parallel Universe, issue 28. The exampe start on page 26

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
favorite_clf = LinearDiscriminantAnalysis()
favorite_clf.fit(X_train, y_train)
test_predictions = favorite_clf.predict(X_test)
#test_predictions = favorite_clf.predict_proba(X_test) # WORK end of INTEL_leaf_
knn_01.py

input("\t pg 28 passed with warnings. Hit return")

# pg 30
#And KNN in Python (scikit-learn):

from sklearn.neighbors import KNeighborsClassifier
#favorite_clf = KNeighborsClassifier(k=4)
favorite_clf = KNeighborsClassifier(4)
favorite_clf.fit(X_train, y_train)
test_predictions = favorite_clf.predict(X_test)

input("\t Pg 30, KNN scikit-learn, passed Hit return")


# pg 30
# KNN training stage in Python (Intel DAAL):

from daal.algorithms.kdtree_knn_classification import training, prediction
from daal.algorithms import classifier, kdtree_knn_classification
trainAlg = kdtree_knn_classification.training.Batch()
trainAlg.input.set(classifier.training.data, X_train)

 

Regards,

Sergio
Enhance your #MachineLearning and #BigData skills via #Python #SciPy
1) https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy...
2) https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-...

 

 

0 Kudos
7 Replies
Christophe_H_Intel2
382 Views

Hi Sergio,

Thanks for raising this question.  PyDAAL algorithms operate on NumericTable data structures instead of directly on numpy arrays. You can do the appropriate conversions as follows.

from daal.data_management import HomogenNumericTable
from daal.algorithms.kdtree_knn_classification import training, prediction
from daal.algorithms import classifier, kdtree_knn_classification
trainAlg = kdtree_knn_classification.training.Batch()
trainAlg.input.set(classifier.training.data, HomogenNumericTable(X_train))
trainAlg.input.set(classifier.training.labels, HomogenNumericTable(np.array(y_train.reshape(792, 1), dtype=np.intc), ntype=np.intc))
trainAlg.parameter.k = 4
trainingResult = trainAlg.compute()

The prediction algorithm will also require a HomogenNumericTable as input.

predictAlg = kdtree_knn_classification.prediction.Batch()
predictAlg.input.setTable(classifier.prediction.data, HomogenNumericTable(X_test))
predictAlg.input.setModel(classifier.prediction.model, trainingResult.get(classifier.training.model))
predictAlg.compute()
predictionResult = predictAlg.getResult()
test_predictions = predictionResult.get(classifier.prediction.prediction)

Chris

sergio_r_
Novice
382 Views

 

Thanks Christopher,

   Your clarification makes the code pass that stage. Surely it might help to continue with the rest of the essay..

Ragards,

Sergio

Enhance your #MachineLearning and #BigData skills via #Python #SciPy
1) https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy...
2) https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-...

 

sergio_r_
Novice
382 Views

 

Hi Chris,

  As expected I was able to finish this example. Now, how can I get to print the values returned in

trainingResult = trainAlg.compute()

and in

test_predictions = predictionResult.get(classifier.prediction.prediction)

straightforward printing gives

In [23]: print(trainingResult)
    ...: 
<daal.algorithms.kdtree_knn_classification.training.Result; proxy of <Swig Object of type 'daal::services::SharedPtr< daal::algorithms::kdtree_knn_classification::training::interface1::Result > *' at 0x7f39f53a4090> >

In [24]: print(test_predictions)
    ...: 
<daal.data_management.NumericTable; proxy of <Swig Object of type 'daal::services::SharedPtr< daal::data_management::interface1::NumericTable > *' at 0x7f39a7627f60> >

 

Salut,

Sergio
Enhance your #MachineLearning and #BigData skills via #Python #SciPy
1) https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy...
2) https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-...

 

Christophe_H_Intel2
382 Views

Hi Sergio,

To print a NumericTable, you can use the utility functions found in <daalroot>/examples/python/source/utils.

import os
import sys
from os.path import join as jp
sys.path.insert(0, jp(os.environ['DAALROOT'], 'examples', 'python', 'source'))
from utils import printNumericTable

printNumericTable(test_predictions)

There is a complete list of PyDAAL examples at https://software.intel.com/en-us/node/682181.  You may also find the tutorials at https://github.com/daaltces/pydaal-tutorials useful.

You should be able to run the remaining algorithms from the essay by using the HomogenNumericTable constructor on any inputs that are numpy arrays. The only caveat is that the only integer type accepted by NumericTables is numpy.intc, so numpy.int64 arrays will need to be converted. The other thing to pay attention to is array shape. You may need to transpose one here or there. In the meantime, I will see if it's possible to publish the complete code samples somewhere.  Let me know if you run into any problems.

Thanks,

Chris

sergio_r_
Novice
382 Views

 

Thanks Chris,

I am getting the error:

    from utils import printNumericTable
ImportError: No module named 'utils'

Before running the code I did: export DAALROOT=/home/intel/intelpython3/pkgs/pydaal-2018.0.0b20170313-py35_intel_0

I also tried (obviously NO success):

$ conda install utils
Fetching package metadata .........
Solving package specifications: .


PackageNotFoundError: Package not found: '' Package missing in current linux-64 channels: 
  - utils

Close matches found; did you mean one of these?

    utils: xlutils, docutils, psutil

You can search for packages on anaconda.org with

    anaconda search -t conda utils

You may need to install the anaconda-client command line client with

    conda install anaconda-client

By the way, I was able to successfully run the examples at pydaal-2018.0.0b20170313-py35_intel_0/sh
are/pydaal_examples/examples/python via the command python run_examples.py

Regards,

Sergio
Enhance your #MachineLearning and #BigData skills via #Python #SciPy
1) https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy...
2) https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-...

 

Christophe_H_Intel2
382 Views

Hi Sergio,

I was incorrectly assuming a separate DAAL installation. Sorry about that. For Intel Python, the utils folder can be found in <install_root>/share/pydaal_examples/examples/python/source. Just make sure that folder is on your sys.path at runtime, or you can point the PYTHONPATH environment variable there.

Chris

sergio_r_
Novice
382 Views

 

OK. Its working. Now what remains is to make sense of output. It does not look any close to the sklearn one.

Sergio
Enhance your #MachineLearning and #BigData skills via #Python #SciPy
1) https://www.packtpub.com/big-data-and-business-intelligence/numerical-and-scientific-computing-scipy...
2) https://www.packtpub.com/big-data-and-business-intelligence/learning-scipy-numerical-and-scientific-...

 

Reply