cumsum error on large float arrays

Michael_R_1 · ‎04-10-2017

Hi, We've run into some inconsistencies when calculating the cumulative sum for large float arrays with numpy.cumsum, which doesn't seem to happen with Anaconda's default (non-Intel) distribution. When the array is longer than about 110,000 points, the result of the 3rd item in the list is already wrong. This doesn't happen with a vector that's only 100,000 points long. I'm pasting the code and some results comparing the intel distribution and default distribution. At first, I thought the problem could be resolved by pre-allocating the memory space and use the "out" argument, because the 3rd item in the list is still correctly calculated, but the end of the array is still wrong. We've also got some trouble with the numpy.unwrap function in the intel distribution, again on large arrays, but I'm still trying to get a consistent example. Are there specific options in the intel distribution for setting precision that could influence this? Below is some code with results in comments, showing the differences between the intel distribution and the anaconda distribution. Any help or suggestions here would be appreciated! Regards Michael ------

import numpy as np
xx=np.arange(1000000)*.1
yyy=np.cumsum(xx)
yyy
''' without intel distribution
array([  0.00000000e+00,   1.00000000e-01,   3.00000000e-01, ...,
         4.99997500e+10,   4.99998500e+10,   4.99999500e+10])
'''
''' with intel distribution
array([  0.00000000e+00,   1.00000000e-01,   2.00000000e-01, ...,
         1.99999300e+05,   9.99998000e+04,   9.99999000e+04])
'''


xx=np.arange(1000000)*.1
yy=xx
np.cumsum(xx,out=yy)
yy
''' without intel distribution
array([  0.00000000e+00,   1.00000000e-01,   3.00000000e-01, ...,
         4.99997500e+10,   4.99998500e+10,   4.99999500e+10])
'''
''' with intel distribution
array([  0.00000000e+00,   1.00000000e-01,   3.00000000e-01, ...,
         2.99998800e+05,   1.99999500e+05,   1.99999700e+05])
'''


xx=np.arange(100000)*.1
yy=xx
np.cumsum(xx,out=yy)
yy
''' with intel distribution
array([  0.00000000e+00,   1.00000000e-01,   3.00000000e-01, ...,
         4.99975000e+08,   4.99985000e+08,   4.99995000e+08])
'''
''' without intel distribution
array([  0.00000000e+00,   1.00000000e-01,   3.00000000e-01, ...,
         4.99975000e+08,   4.99985000e+08,   4.99995000e+08])
'''

DavidLiu · ‎04-10-2017

Hi Michael,

I've attempted this on two different systems, and have been unable to produce your issue. Could you give us some more details on the conda, numpy, and python versions, in addition to the hardware configuration(s) you are on?

Thanks,

David

Michael_R_1 · ‎04-10-2017

Hi David,

Interesting. We're seeing the same error on three different computers: one surface pro 4, one desktop i7-7700, and one gigabyte laptop, i7-4710HQ, each with win10 Home-x64, and the following python packages: python 3.5.2, numpy 1.11.2, conda 4.3.14, intelpython 2017.0.2.

The computers have different amounts of ram (16gb-64gb), and for each of them we followed the straightforward intel python distribution as described here (https://software.intel.com/en-us/articles/using-intel-distribution-for-python-with-anaconda), after installing anaconda.

This is running in the IPython shell, via spyder, same error if I run on the Python console (the latter one starts up saying "Python 3.5.2 |Intel Corporation| (default, Feb 5 2017, 02:57:01) [MSC v.1900 64 bit (AMD64)] on win32...".

Michael

gaston-hillar · ‎04-10-2017

Hi David and Michael,

I was curious about this issue and I could reproduce the problem Michael reports on macOS El Capitan, on a MacBook Pro powered by the following Intel CPU: Intel® Core™ i5-4278U Processor. The results are different as Michel reports in his explanation.

gaston-hillar · ‎04-10-2017

Hi David and Michael,

I used the default Python version that comes installed with macOS El Capitan, which is Python 2.7.10.

gaston-hillar · ‎04-10-2017

Hi David and Michael,

Not sure whether it helps or not. However, I also executed the code on the following console provided by PythonAnywhere (https://www.pythonanywhere.com/try-ipython/), and you can see the output is the same one that Michael reports and it is different than the results generated by the Intel distribution.

gaston-hillar · ‎04-10-2017

Hi David and Michael,

I executed the first example Michael reported on a Windows 10 laptop powered by an Intel Core i7-6700HQ CPU. The results do not have the differences that Michael reported. So, Intel Distribution for Python produces a different result on macOS / Windows or on the different CPUs. Not sure which is the issue. In this case, the Intel Distribution for Python produces the same results than Python 3.5.2 (not Intel).

Python 3.5.2 (64-bit), non Intel distribution produces the results that Michael has reported.

Sample output:

Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> xx=np.arange(1000000)*.1
>>> yyy=np.cumsum(xx)
>>> yyy
array([  0.00000000e+00,   1.00000000e-01,   3.00000000e-01, ...,
         4.99997500e+10,   4.99998500e+10,   4.99999500e+10])
>>>

The same coded executed on Intel Distribution for Python produces the following output. No different from the previous output but different from the results reported by Michael.

Python 3.5.2 |Intel Corporation| (default, Feb  5 2017, 02:57:01) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import numpy as np
>>> xx=np.arange(1000000)*.1
>>> yyy=np.cumsum(xx)
>>> yyy
array([  0.00000000e+00,   1.00000000e-01,   3.00000000e-01, ...,
         4.99997500e+10,   4.99998500e+10,   4.99999500e+10])

So, Michael, it would be great if you can share OS and hardware info.

gaston-hillar · ‎04-10-2017

Michael,

I hadn't seen your message in which you described the hardware... So, forget about my last lines in which I was suggesting you to provide hardware info. :)

Michael_R_1 · ‎04-10-2017

Hi Gaston, Thanks for that. Very strange. I'm not seeing any pattern yet. How did you install the intel distribution for your last post? Simply Anaconda, then intel distribution followed possibly by updates? We followed that procedure for each of the three computers we're using here, all three with the same problem when using the intel distribution. I've updated in the meantime to conda 4.3.16, same problem still. Cheers, Michael

Sergey_M_Intel2 · ‎04-11-2017

Hello everybody,

Intel engineers reproduced the error which appears to be related to how numpy computes cumulative sum. Our recent numpy optimizations did not take into account that. Interestingly internal tests did not reveal this issue during validation.

Engineers will report soon whether they see the workaround.

Sorry about that,

Sergey

gaston-hillar · ‎04-11-2017

Michael Roelens wrote:

Hi Gaston,

Thanks for that. Very strange. I'm not seeing any pattern yet. How did you install the intel distribution for your last post? Simply Anaconda, then intel distribution followed possibly by updates? We followed that procedure for each of the three computers we're using here, all three with the same problem when using the intel distribution.

I've updated in the meantime to conda 4.3.16, same problem still.

Cheers,

Michael

Michael, I've used the installation provided by Intel for Windows to install Intel Distribution for Python on Windows 10.

Oleksandr_P_Intel · ‎04-11-2017

Dear Michael,

Thank you very much for taking the time to bring this to our attention. We reproduced the problem, and it affects universal functions applied to large arrays of doubles, floats, or corresponding complexes.

`np.cumsum` is the chief mainstream operation affected, although non-standard uses of any universal functions can be affected.

Regrettably, there is no setting within numpy to work around the issue. Chunking the array using slices, and applying the function to these chunks comes to mind, but this is too much to ask.

We are working to provide a hotfix.

The `np.unwrap` is affected because it uses `np.cumsum` underneath.

Our release process relies on community tests for validation, but evidently no test exercised the culprit optimization code we added.

I will announce the hotfix on this thread as soon as it becomes available.

Thank you for your understanding,
Oleksandr

Michael_R_1 · ‎04-11-2017

Hi Oleksandr and Sergey, I'm very impressed by how quickly you responded and figured out what's going wrong. Thanks for confirming, and looking forward to the hotfix! Kind regards, Michael

Oleksandr_P_Intel · ‎04-19-2017

Hi Michael,

I am happy to report that the fix for this issue has been posted. Please try updating NumPy in the distribution by running

 conda update -c intel numpy

This should fix the issue underlying the observed erroneous behavior.

Michael_R_1 · ‎04-19-2017

Hi Oleksandr,

Thanks a lot for that. I had to uninstall intelpython3_core/full (2017.0.2) to be able to install the 1.11.3 version of numpy here though, because conda said the two weren't compatible at the moment. But the updated package does indeed fix the cumsum problem.

Kinda makes me wonder: what is the intelpython3_core or full package needed for? Is it some kind of wrapper that contains a bunch of packages?

Thank again, for the quick fix!

Michael

Todd_T_Intel · ‎04-20-2017

Michael,

You are essentially correct. The intelpython3_core package is a "metapackage": it contains no files of its own, but rather collects a set of other packages into a named unit for ease of installation. When you install a particular version of "intelpython3_core" or "intelpython3_full", you will get all the packages we released.

There should be no need for you to manually uninstall it. I will check the update logic to be certain it is working as expected.

Todd

gaston-hillar · ‎04-20-2017

@Oleksandr,

Should I run this update on Intel Distribution for Python 3.5.2, too, or is it only necessary to run it for Intel Distribution for Python 2.7? I'm working with both versions on Windows, macOS and Linux.

Oleksandr_P_Intel · ‎04-20-2017

The changes in the update are not specific to any particular version of Python, or to the platform. Updates were posted for all platforms, and for both Python 2.7 and Python 3.5

Todd_T_Intel · ‎04-20-2017

The problem with the intelpython3_core package saying there was a conflict when attempting to update to the repaired numpy should be fixed.

Sorry for the trouble.

Todd

gaston-hillar · ‎04-20-2017

@Oleksandr,

Thanks for clarification. I've successfully updated all my versions.

Michael_R_1 · ‎04-20-2017

Hi Todd, Thanks for that. Indeed, after updating my package index, I was able to install the two together now. Thanks everyone for the quick fix and support! Michael