- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am having problems with an MPI code using Intel MKL and ifort (Composer version: 13.1.0.146). Each processor has exactly the same matrix, and they should be able to perform some sequential operations. Each processor is expected o obtain exactly the same values, since they are using the same binaries, same libraries and each node is in fact identical (2 Sandy Bridge EP E5-2670 processors in each node). However, routines as CGEMM and CGESVD produce slightly different values in each processor, a variantion of the order of 1e-6~1e-8. This does not always happen, and it seem to depend on the number of processors being used.
Is this behaviour expected at all? The difference is below the machine precision (considering single precision) but aren't the individual cores suppose to perform the roundoffs in the same manner? If this behaviour is not expected I could provide some example matrices.
Thanks in advance
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
identical results would require assurance of same data alignment mod 32 byte or using the slower consistency option
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
holysword wrote:
Is this behaviour expected at all? The difference is below the machine precision (considering single precision) but aren't the individual cores suppose to perform the roundoffs in the same manner?
Thank you for asking this question! MKL does have a way to guarantee identical results as long as some preconditions are met. We call this feature "Conditional Numerical Reproducibility". See here for a complete discussion on how to use this feature: http://software.intel.com/en-us/articles/conditional-numerical-reproducibility-cnr-in-intel-mkl-110
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much TimP, Zhang Z and Sergey Kostrov.
Setting KMP_DETERMINISTIC_REDUCTION=yes and MKL_BWR=SSE4_2 solves the issue with no noticeable slowdown. I still compile with the same optimization flags (including -xAVX). I tried to use MKL_BWR=AVX but that didn't work, I wonder why; all the processors are the same, and they are all EP E5-2670. All the dlls and libraries are the same also.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You didn't say whether you took care to set all local data passed to MKL on 32-byte boundaries (16-byte may be sufficient if you avoid AVX, but 32 may improve performance, even with SSE).
The variations you quote are consistent with single precision vector sum reduction on arrays of differing alignment. You could check each address passed to MKL % 16 for consistency. If you succeed in using the non-deterministic AVX it may not be the identical result as the "deterministic" one.
DETERMINISTIC_REDUCTION may not permit use of AVX-256 as that could require different blocking, incompatible with consistent results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
in my tests 32 byte alignment is of more benefit on early core i7 so I agree it may not appear on latest CPU.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TimP (Intel) wrote:
You didn't say whether you took care to set all local data passed to MKL on 32-byte boundaries (16-byte may be sufficient if you avoid AVX, but 32 may improve performance, even with SSE).
I am sorry, what do you mean with 32-byte boundaries? All variables are defined with the default kind ( that is, just REAL, COMPLEX and INTEGER, no DOUBLE PRECISION, KIND declaration or anything of that sort).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page