Intel® oneAPI Data Analytics Library
Community support for building compute-intensive applications that run fast on Intel® architecture.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
207 Discussions

DAAL4PY has different results with sklearn PCA

Yang__Jonas
Beginner
395 Views

Hi guys:

I was using daal4py for large data set. It is super fast, but the result looks wrong.

I had already searched the web and I know that DAAL4py will normalize data before compute PCA. But still, I normalized data and pass to sklearn, the eigen values and vectors varying a lot!

Attached the code and data, I like to see a consist data between them. Otherwise, we are not convinced we could use DAAL in our production. Please note that preprocessing data before passing to sklearn is acceptable. But I tried both minmax/zscore method. The results are quite different from intel PCA results.

test.zip has the testing data, which is 10000 * 512 tensor.

Run the script on my windows I got the following result.

Intel engvals = [136.30274983  85.45575273  51.9877961 ]
Sklearn engvals = [213.9291753  102.09328516  76.81426116]

0 Kudos
1 Reply
Yang__Jonas
Beginner
395 Views

I figured out the issue. Intel PCA always normalization the data before computation. While, you have to set doScale = True and then pass to sklearn, and expected_variance is the eigen values instead of singular values.

Compare with that results, they are pretty similar. 

No issues on DAAL side.

Reply