Intel® oneAPI Data Analytics Library
Learn from community members on how to build compute-intensive applications that run efficiently on Intel® architecture.
228 Discussions

Is this library related to Hadoop? How do I use it on Hadoop?

VipinKumar_E_Intel
780 Views

Is this library related to Hadoop? How do I use it on Hadoop?

0 Kudos
6 Replies
Sergey_M_Intel2
Employee
780 Views

This library is for Hadoop and other Big Data infrastructures. 

The beauty of Intel® Data Analytics Acceleration Library, or Intel DAAL, is that its APIs are abstracted from the cross-device/node communication layer. Which makes it really flexible to implement variety of usage scenarios, including variety of approaches for distributed computing.

To simplify integration of Intel DAAL with popular distributed computing infrastructures/technologies the library is equipped with code samples. C++ code samples for DAAL distributed algorithms relying on MPI*, Java* code samples for DAAL distributed computing with HDFS and Spark* RDD.

Thank you,

Sergey Maidanov

0 Kudos
zhengda1936
Beginner
781 Views

I also have the same question on how we use it on Hadoop. To be more specific, how to use it with MapReduce? Even more specific, do we invoke the functions in this library in the Map or Reduce function?

Thanks,

Da

 

0 Kudos
Sergey_M_Intel2
Employee
781 Views

Hi Da,

Yes, Intel DAAL functions are invoked from Hadoop map/reduce functions.

Let us take the distributed computation of the Principal Component Analysis using the SVD method (see the DAAL Programming Guide for the workflow details).

Map

    public void map(Object key, InputData, context) throws IOException, InterruptedException {

        /* This is a local part of input data */
        double[] data = InputData.getArray(nFeatures, nVectorsInBlock);
        daal.data.HomogenNumericTable ntData = new daal.data.HomogenNumericTable(data, nFeatures, nVectorsInBlock);

        /* This will contain partial result on local node */
        daal.data.HomogenNumericTable ntNodeComputations =
                new HomogenNumericTable(Double.class, nFeatures, nFeatures, NumericTable.AllocationFlag.DoAllocate);

        /* Create algorithm object */
        PCA pcaAlgorithm = new PCA(Double.class, daal.data.PCA.Method.PCASVD, daal.data.PCA.InputDataType.normalizedDataSet);
        pcaAlgorithm.setComputeMode(daal.data.ComputeMode.Distributed);

        /* Do computations */
        pcaAlgorithm.compute(ntData, ntNodeComputations);

        long[] nObservationsArray = { nVectorsInBlock };
        
        /* Here is serialization of partial data */

        context.write(new Text(), serializedPartialData);
    }

Reduce

    public void reduce(Text key, Iterable<serializedType> values, Context context) IOException, InterruptedException {

        /*Arrays for partial results from nodes */
        HomogenNumericTable[] computeResults = new HomogenNumericTable[nBlocks];
        HomogenNumericTable[] nObservations = new HomogenNumericTable[nBlocks];

       NumericTable[] mergeInputs = new NumericTable[2 * nBlocks];
        for (int i = 0; i < nBlocks; i++) {
            /* Here is deserialization of partial data */
            mergeInputs[2 * i] = nObservations;
            mergeInputs[2 * i + 1] = computeResults;
        }

        /* Create numeric tables for storing PCA results */
        eigenvectors = new daal.data.HomogenNumericTable(Double.class, nFeatures, nFeatures, NumericTable.AllocationFlag.DoAllocate);
        eigenvalues = new daal.data.HomogenNumericTable(Double.class, nFeatures, 1, NumericTable.AllocationFlag.DoAllocate);
        daal.data.NumericTable[] results = { eigenvectors, eigenvalues };

        PCA pcaAlgorithm = new PCA(Double.class, PCA.Method.PCASVD, PCA.InputDataType.normalizedDataSet);
        pcaAlgorithm.setComputeMode(ComputeMode.Distributed);
        pcaAlgorithm.merge(mergeInputs, results);

        double[] eigenvaluesArray = eigenvalues.getDoubleArray();
        double[] eigenvectorsArray = eigenvectors.getDoubleArray();
    }

The eigenvaluesArray and eigenvectorsArray will contain final eigenvalues and eigenvectors respectively. 

I hope it helps,

Thank you,

Sergey Maidanov

0 Kudos
zhengda1936
Beginner
781 Views

I guess the DAAL Programming Guide isn't released yet?

The workflow isn't intuitive to me. I suppose the input is a matrix? So first map() partitions the matrix and run PCA on part of the matrix completely on the local node? Then one MapReduce can generate the final eigenvalues and eigenvectors?

Computing eigenvalues/vectors requires a sequence of matrix vector multiplication. Each matrix vector multiplication requires data shuffling in MapReduce. Unless the DAAL performs computation in a distributed fashion at the background, I don't understand how a single MapReduce can accomplish the task.

0 Kudos
Zhang_Z_Intel
Employee
781 Views

zhengda1936 wrote:

I guess the DAAL Programming Guide isn't released yet?

The programming guide is available as part of the "User and Reference Guide of Intel Data Analytics Acceleration Library". You can find it after you download and install the library. Please visit https://software.intel.com/en-us/articles/announcing-intel-data-analytics-acceleration-library-2016-beta and follow the links to download it. 

0 Kudos
Priyanka_K_
Beginner
781 Views

Hadoop has native implementations of certain components for performance reasons and for non-availability of Java implementations. These components are available in a single, dynamically-linked native library called the native hadoop library. On the *nix platforms the library is named libhadoop.so.

Thanks,

Priyanka,

Hadoop Developer @ Catch Experts,

http://www.catchexperts.com/hadoop/online-training​

 

0 Kudos
Reply