Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Data Analytics Library & Intel® Data Analytics Acceleration Library
- Is this library related to Hadoop? How do I use it on Hadoop?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

VipinKumar_E_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-24-2015
09:24 PM

44 Views

Is this library related to Hadoop? How do I use it on Hadoop?

Is this library related to Hadoop? How do I use it on Hadoop?

Link Copied

6 Replies

Sergey_M_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-26-2015
09:47 PM

44 Views

This library is for Hadoop and other Big Data infrastructures.

The beauty of Intel® Data Analytics Acceleration Library, or Intel DAAL, is that its APIs are abstracted from the cross-device/node communication layer. Which makes it really flexible to implement variety of usage scenarios, including variety of approaches for distributed computing.

To simplify integration of Intel DAAL with popular distributed computing infrastructures/technologies the library is equipped with code samples. C++ code samples for DAAL distributed algorithms relying on MPI*, Java* code samples for DAAL distributed computing with HDFS and Spark* RDD.

Thank you,

Sergey Maidanov

zhengda1936

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-28-2015
04:43 PM

44 Views

I also have the same question on how we use it on Hadoop. To be more specific, how to use it with MapReduce? Even more specific, do we invoke the functions in this library in the Map or Reduce function?

Thanks,

Da

Sergey_M_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-01-2015
06:31 AM

44 Views

Hi Da,

Yes, Intel DAAL functions are invoked from Hadoop map/reduce functions.

Let us take the distributed computation of the Principal Component Analysis using the SVD method (see the DAAL Programming Guide for the workflow details).

**Map**

public void map(Object key, InputData, context) throws IOException, InterruptedException { /* This is a local part of input data */ double[] data = InputData.getArray(nFeatures, nVectorsInBlock); daal.data.HomogenNumericTable ntData = new daal.data.HomogenNumericTable(data, nFeatures, nVectorsInBlock); /* This will contain partial result on local node */ daal.data.HomogenNumericTable ntNodeComputations = new HomogenNumericTable(Double.class, nFeatures, nFeatures, NumericTable.AllocationFlag.DoAllocate); /* Create algorithm object */ PCA pcaAlgorithm = new PCA(Double.class, daal.data.PCA.Method.PCASVD, daal.data.PCA.InputDataType.normalizedDataSet); pcaAlgorithm.setComputeMode(daal.data.ComputeMode.Distributed); /* Do computations */ pcaAlgorithm.compute(ntData, ntNodeComputations); long[] nObservationsArray = { nVectorsInBlock }; /* Here is serialization of partial data */ context.write(new Text(), serializedPartialData); }

**Reduce**

public void reduce(Text key, Iterable<serializedType> values, Context context) IOException, InterruptedException { /*Arrays for partial results from nodes */ HomogenNumericTable[] computeResults = new HomogenNumericTable[nBlocks]; HomogenNumericTable[] nObservations = new HomogenNumericTable[nBlocks]; NumericTable[] mergeInputs = new NumericTable[2 * nBlocks]; for (int i = 0; i < nBlocks; i++) { /* Here is deserialization of partial data */ mergeInputs[2 * i] = nObservations; mergeInputs[2 * i + 1] = computeResults; } /* Create numeric tables for storing PCA results */ eigenvectors = new daal.data.HomogenNumericTable(Double.class, nFeatures, nFeatures, NumericTable.AllocationFlag.DoAllocate); eigenvalues = new daal.data.HomogenNumericTable(Double.class, nFeatures, 1, NumericTable.AllocationFlag.DoAllocate); daal.data.NumericTable[] results = { eigenvectors, eigenvalues }; PCA pcaAlgorithm = new PCA(Double.class, PCA.Method.PCASVD, PCA.InputDataType.normalizedDataSet); pcaAlgorithm.setComputeMode(ComputeMode.Distributed); pcaAlgorithm.merge(mergeInputs, results); double[] eigenvaluesArray = eigenvalues.getDoubleArray(); double[] eigenvectorsArray = eigenvectors.getDoubleArray(); }

The eigenvaluesArray and eigenvectorsArray will contain final eigenvalues and eigenvectors respectively.

I hope it helps,

Thank you,

Sergey Maidanov

zhengda1936

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-07-2015
07:52 PM

44 Views

I guess the DAAL Programming Guide isn't released yet?

The workflow isn't intuitive to me. I suppose the input is a matrix? So first map() partitions the matrix and run PCA on part of the matrix completely on the local node? Then one MapReduce can generate the final eigenvalues and eigenvectors?

Computing eigenvalues/vectors requires a sequence of matrix vector multiplication. Each matrix vector multiplication requires data shuffling in MapReduce. Unless the DAAL performs computation in a distributed fashion at the background, I don't understand how a single MapReduce can accomplish the task.

Zhang_Z_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-11-2015
11:15 AM

44 Views

zhengda1936 wrote:

I guess the DAAL Programming Guide isn't released yet?

The programming guide is available as part of the "User and Reference Guide of Intel Data Analytics Acceleration Library". You can find it after you download and install the library. Please visit https://software.intel.com/en-us/articles/announcing-intel-data-analytics-acceleration-library-2016-... follow the links to download it.

Priyanka_K_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-02-2016
03:19 AM

44 Views

Hadoop has native implementations of certain components for performance reasons and for non-availability of Java implementations. These components are available in a single, dynamically-linked native library called the native hadoop library. On the *nix platforms the library is named libhadoop.so.

Thanks,

Priyanka,

Hadoop Developer @ Catch Experts,

http://www.catchexperts.com/hadoop/online-training

For more complete information about compiler optimizations, see our Optimization Notice.