Different results on different computers when using Scipy stack compiled with MKL

Dmitry_K_ · ‎03-25-2017

Hi, I've noticed that my code gives different results on two different computers on the same inputs.

It is a python code which in a nutshell performs SVD on the large input matrix, truncates SVD-produced matrices, constructs a new small matrix, and finally, finds eigenvalues of this small matrix and dumps only one of these eigenvalues in a deterministic way. Hence, the code depends heavily on the functions from scipy.linalg package. I also add some noise to the input matrix in the beginning of computations, but I set the seed before doing that, so the noise should be the same for each run.

Two different computers are:
1) Lenovo D30 with Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz;
2) Dell Precision T7500 with Intel(R) Xeon(R) CPU X5650 @ 2.67GHz

I run the code on both machines with environment variable OMP_NUM_THREADS set to 1 to make sure that the discrepancy is not due to parallelism.

So, I run my code on these two computers and with two different Python distributions: Continuum Anaconda 4.2 and Intel Python Distribution 2017u2.

The results are attached (they are in a table form; couldn't find the way to write tables into the post itself). You can see from the results that for Anaconda I have such a strange behavior that for Case 2 on one computer the real part of the eigenvalue is negative, and on another one it's positive. For Intel Python, the biggest difference is for Case 1, where real parts significantly different (50 %).

Could you please explain to me, why the results are different? Can it be a bug in MKL?

543667

gaston-hillar · ‎03-27-2017

Dmitry,

I know you probably cannot. However, the question doesn't hurt. Are you able to share the code?

Jamie_H_ · ‎03-27-2017

jamie Howarth

Dmitry -

Your problem seems to be quite similar to mine ... and your knowledge of DVD manipulation is possible of use to me and my company. I hope it's not inappropriate for me to ask if we could be put in touch?

Jamie Howarth - Plangent Processes

Sergey_M_Intel2 · ‎03-27-2017

Hello Dmitry,

Thank you for reporting the issue. We're trying to reproduce. Will come back soon with initial analysis.

Sergey

Oleksandr_P_Intel · ‎03-28-2017

Dmitry,

Without having a reproducer there is little that can be said about the root cause of the observed discrepancy.

Here are the steps I would take towards debugging.

1. Please try using numpy.save to save the smaller truncated matrix to a file in both editions, and then try comparing them with np.allclose(mat1, mat2).

2. If both matrices are the same, please attach the file to this ticket, since this is the reproducer we need.

3. If matrices turn out to be different, we need to proceed further upstream to pin-point which operation caused the discrepancy.

Oleksandr_P_Intel · ‎03-30-2017

Dear Dmitry,

Please check if the discrepancy goes away if the python script is executed in an environment with MKL_CBWR=AUTO, which enables conditional numerical reproducibility mode.

See artciles https://software.intel.com/en-us/node/528408 and https://software.intel.com/en-us/node/528409 for more details.

Thank you,
Oleksandr

Dmitry_K_ · ‎03-30-2017

I've got a permission to share the code with Intel employees but privately because it's a research code. Currently, the code is in a private GitHub repo (along with inputs and results). What is the best way to share it with you, Sergey or Oleksandr? If you know my email address, feel free to contact me through email.

Oleksandr, using MKL_CBWR=AUTO I get exactly the same results as I get without setting this environment variable.

Gaston, if you are affiliated with Intel, I think I can share the code with you as well.

gaston-hillar · ‎04-01-2017

Dmitry K.,

This is an open forum. If somebody searches on Google, they will find everything that is posted in this forum.

Most of the people that ask for answers in the public forums share their code so that they can receive help from Intel® Engineers and from others that are not from Intel® but are working with Intel® products.

I understand that you cannot share research code. However, I just wanted to let you know you are in a public forum, just in case you share things here by mistake.

As you don't share the code, I won't be able to provide you my experience using Intel Distribution for Python. I'm sure Intel® Engineers will be able to help you.

However, in case you cannot share things about your solution, you might probably want to consider buying a license and using the private forums. When you buy licenses, you have premium support options where Intel® Engineers are able to provide premium support for your cases in which you can share code privately with them. Just in case you didn't know.

I don't earn a referral fee on any license sold. However, I thought it was probably helpful for you to know about additional support options. Good luck with the solution to your issue.

Dmitry_K_ · ‎04-01-2017

Thank you, Gastón.

Unfortunately, I didn't get the permission to make the code public for everybody. Hopefully, the Intel engineers will fix the problem soon.

Sergey_M_Intel2 · ‎04-01-2017

Hello Gastón,

We're still root causing the issue.

Despite Dmitry cannot share the code publicly I can reassure you that Intel engineers will publicly report back what was wrong with Intel Python.

Current hypothesis (confirmed by Oleksandr's experiments) is that the wrong code path in MKL is chosen when Dmitry calls one of NumPy/SciPy functions. We do not know yet what function misbehaves, which is bad news. The good news is that we reproduced the misbehavior. Interestingly the behavior is only reproduced with certain versions of MKL.

Keep you all updated,

Thank you,

Sergey

gaston-hillar · ‎04-02-2017

Sergey,

It's great to know that Intel engineers will publicly report back if something was wrong. I'm working hard with the same stack and it will also be helpful for me to know the results of this thread.

Thanks!

gaston-hillar · ‎04-02-2017

Dmitry K. wrote:

Thank you, Gastón.

Unfortunately, I didn't get the permission to make the code public for everybody. Hopefully, the Intel engineers will fix the problem soon.

Dmitry, I completely understand. I also work with code pieces that are under dozens of NDAs. :)

Oleksandr_P_Intel · ‎04-14-2017

After receiving the code from Dmitry I was able to reproduce exact same different results by using conditional numeric reproducibility flag MKL_CBWR.

The processor Intel(R) Xeon(R) CPU E5-2680 v2 from Lenovo computer supports extended processor instruction set AVX (see http://ark.intel.com/products/75277/Intel-Xeon-Processor-E5-2680-v2-25M-Cache-2_80-GHz), while the older processor from the Dell computer only supports SSE4.2 set (http://ark.intel.com/products/47922/Intel-Xeon-Processor-X5650-12M-Cache-2_66-GHz-6_40-GTs-Intel-QPI).

Setting environmental variables MKL_CBWR=AVX and OMP_NUM_THREADS=1 and running Dmitry’s scripts, I obtained exact same outputs down to the last decimal digit, reported for Lenovo.

Setting MKL_CBWR=SSE4_2 and OMP_NUM_THREADS=1, I obtained outputs reported for Dell.

As alluded to by Dmitry the code performs singular value decomposition of a rectangular array, that has most of its eigenvalues close to the machine epsilon, while few others are on the scale of 1, making the input very ill-conditioned.

Exercising different implementations of the same decomposition algorithm introduces different round-off errors, leading to different values of very small singular values, and most importantly to different associated orthogonal vectors.

The orthogonal vectors are used to perform dimensional reduction. The relatively small error in orthogonal vectors is further amplified by dividing results of its dot-product by the associated small singular value, leading to discrepancies.

In short the computational problem arising in Dmitry’s code is ill-conditioned, and expected discrepancies are naturally amplified during the run of the algorithm.

In other words, my conclusion is that the difference is not due to any bug in the Intel Distribution for Python.