Intel® oneAPI Data Analytics Library
Learn from community members on how to build compute-intensive applications that run efficiently on Intel® architecture.

pyDAAL: the results of PCA Online Processing

yingxing__bao
Beginner
951 Views

I run the examples the document provided, then I can only get  the eigenvalues and the eigenvectors. but I want to get the transformed values after Online PCA, Because I only want to use the final result and compare it with  Incrementalpca in sklearn, obviously, the two results are not directly comparable , I want to know how to use the eigenvalues and the eigenvectors to get the converted values in python

0 Kudos
6 Replies
Preethi_V_Intel
Employee
951 Views

Hi,

You need to use "daal.algorithms.pca.transform" library to transform the data. I've update the "pca_svd_dense_online.py" example to implement data transformation using Eigen values. Hope that helps.

import os
import sys

from daal.algorithms import pca
import daal.algorithms.pca.transform as pca_transform
from daal.data_management import FileDataSource, DataSourceIface

utils_folder = os.path.realpath(os.path.abspath(os.path.dirname(os.path.dirname(__file__))))
if utils_folder not in sys.path:
 sys.path.insert(0, utils_folder)
from utils import printNumericTable

DAAL_PREFIX = os.path.join('..', 'data')

# Input data set parameters
nVectorsInBlock = 250
dataFileName = os.path.join(DAAL_PREFIX, 'online', 'pca_normalized.csv')

if __name__ == "__main__":

 # Initialize FileDataSource<CSVFeatureManager> to retrieve the input data from a .csv file
 dataSource = FileDataSource(
  dataFileName, DataSourceIface.doAllocateNumericTable,
  DataSourceIface.doDictionaryFromContext
 )

 # Create an algorithm for principal component analysis using the SVD method
 algorithm = pca.Online(method=pca.svdDense)

 while(dataSource.loadDataBlock(nVectorsInBlock) == nVectorsInBlock):
  # Set the input data to the algorithm
  algorithm.input.setDataset(pca.data, dataSource.getNumericTable())

  # Update PCA decomposition
  algorithm.compute()

 # Finalize computations
 result = algorithm.finalizeCompute()

 # Print the results
 printNumericTable(result.get(pca.eigenvalues), "Eigenvalues:")
 printNumericTable(result.get(pca.eigenvectors), "Eigenvectors:")
 
 
 
 #Data Transformation
 tralgorithm = pca_transform.Batch()

 # Set lower and upper bounds for the algorithm
 tralgorithm.parameter.nComponents = 2

 dataSource = FileDataSource(
  dataFileName, DataSourceIface.doAllocateNumericTable,
  DataSourceIface.doDictionaryFromContext
 )
 
 dataSource.loadDataBlock()
 
 # Set an input object for the algorithm
 tralgorithm.input.setTable(pca_transform.data, dataSource.getNumericTable())

 # Set an input object for the eigenvectors
 tralgorithm.input.setTable(pca_transform.eigenvectors, result.get(pca.eigenvectors))

 # Set an input object for the eigenvectors
 tralgorithm.input.setCollection(pca_transform.dataForTransform, result.getCollection(pca.dataForTransform))

 # Compute PCA transformation function
 trres = tralgorithm.compute()
 
 printNumericTable(dataSource.getNumericTable(), "First rows of the input data:", 4)
 printNumericTable(trres.get(pca_transform.transformedData), "First rows of the min-max normalization result:", 4)
 

 

See here to get a detailed explanation on the library's usage

 

0 Kudos
yingxing__bao
Beginner
951 Views

when I follow your advise, I get the error

ImportError: No module named 'daal.algorithms.pca.transform'; 'daal.algorithms.pca' is not a package

 

0 Kudos
Preethi_V_Intel
Employee
951 Views

What version of daal are you using? This is available in 2018.0.1 version. You can check that at by using the command

import daal
daal.__version__

 

0 Kudos
yingxing__bao
Beginner
951 Views

How can I update the daal. I tried 

pip install daal --upgrade

and 

pip uninstall daal 
pip install daal 

but the version of daal is still '2017.0.3.20170414'

0 Kudos
Gennady_F_Intel
Moderator
951 Views

you may also try to take the latest version of DAAL  download this free library from this resource - https://software.intel.com/en-us/performance-libraries

0 Kudos
Preethi_V_Intel
Employee
951 Views

I'm not complete sure if the latest version of daal is available in PyPI. It is recommended to use Anaconda cloud to install latest versions of Intel related Python packages.

conda install daal -c intel

I assume you are using Intel Distribution for Python. If not, the instructions are here. It's pretty simple.

0 Kudos
Reply