Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- different CPU leads to different results

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Azua_Garcia__Giovann

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-29-2012
08:38 AM

193 Views

different CPU leads to different results

I have been developing an iterative algorithm where most of the computation involves MMM, MVM, forward and backward solve, as well as several BLAS LAPACK functions available in MKL.

For big problem sizes I get diverging results in two different CPUs. All the software is exactly the same:

- OS Linux Ubuntu 11.10 kernel version 3.0.0-22-generic
- Intel parallel_studio_xe_2011_sp1_update2_intel64.tgz (MKL 10.2)
- Intel l_mkl_10.3.10.319_intel64.tgz update
- icc (ICC) 12.1.3 20120212

The two systems I have:

- Intel 2 Core Duo on a MacBook Pro T9900 17'' Mid. 2009 (dual boot Ubuntu 11.10 kernel 3.0.0-22-generic)
- Intel i7 3930K C2 stepping Desktop on an ASUS Rampage Extreme IV (Ubuntu 11.10 kernel 3.0.0-22-generic)

Basically the Intel Core 2 Duo MBP produces correct results whereas the Intel i7 3930K the results differ greatly (final result, number of iterations etc). To discard possibilities I started downgrading the icc settings e.g. removed -no-prec-div and this improved the situation for the n=2000 problem size but for larger problem sizes it fails to converge correctly. I switched to use g++ instead of icpc and the non reproduceability problem still persists. Hence, all signs point different MKL behavior depending on the processor.

I came across the article below. Is this a solution to my problem or is there a way to ensure reproduceability using the current MKL release? http://software.intel.com/en-us/articles/intro-to-CBWR-in-intel-mkl/

Many TIA,

Best regards,

Giovanni

Link Copied

8 Replies

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-29-2012
09:05 AM

193 Views

I suppose the MKL has been updated in the more recent releases.

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-30-2012
12:04 AM

193 Views

Yes, this new functionality ( CBWR ) will help you for getting the identical result while you use these routines.

These functions are available in of MKL version 11.0 beta.

Azua_Garcia__Giovann

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-30-2012
08:29 AM

193 Views

Thank you TimP I wasn't using the settings you suggest, in fact I was using -xHost which will exploit all CPU natively available optimizations and features AFAIK. I changed the compiler settings to what you suggested and it helped a lot. I now get divergent results only for one problem size and it does look like a bug in my code. I am testing it now using valgrind. Thank you.

Gennady Thanks I am using MKL 11.0 now, the only bit that worries me is memory alignment. I use a central bufferpool that preallocates all the memory needed for my algorithm once and upon startup. My matrices are all page size aligned and the vectors are all 16 byte memory address aligned (SSE). However, some times I need to pass to MKL memory addresses which are not directly memory-aligned allocated e.g. a column vector within one of the matrices and in cases like this I am wondering what the outcome would be. "To ensure MKL calls return the same results on

Page size alignment:

double* buffer = NULL;

posix_memalign((void**) &buffer, sysconf(_SC_PAGESIZE), size*sizeof(double));

SSE alignment:

double* buffer = NULL;

posix_memalign((void**) &buffer, 16, size*sizeof(double));

Best regards,

Giovanni

Azua_Garcia__Giovann

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-01-2012
03:17 AM

193 Views

-align

-finline-functions

-malign-double

-no-prec-div

-openmp

-opt-multi-version-aggressive

-scalar-rep

-unroll-aggressive

To my surprise the problem was memory misalignment for some of the matrices/vectors used as input to MKL. This would only affect reproduceability of the results while using the i7 3930K and not while using the older Core 2 Duo processor. So my problem was due to the alignment. While using MKL 11.0 beta and tweaking the environment variable

The gcc/g++/gfortran compiler produces better performance results than icc with the options:

-march=native

-fopenmp

-fomit-frame-pointer

-funroll-loops

-ffast-math

-funsafe-math-optimizations

In conclusion, a combination of

Many TIA,

Best regards,

Giovanni

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-01-2012
04:38 AM

193 Views

AVX mode may take advantage of alignments up to 32-byte aligned.

When you request multi-threaded, MKL may still choose single threaded if the problem isn't large enough to benefit from multiple threads.

Your gcc/gfortran options include the equivalent of icc -complex-limited-range which could make a big difference if you have complex arithmetic.

It will make a difference which versions of icc and gcc you use, particularly for AVX at -O3.

Azua_Garcia__Giovann

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-02-2012
02:17 AM

193 Views

Thank you. I aligned all MKL input vectors to 32-byte alignment and it produces perfectly accurate results. Indeed a quick check reveals some substantial speed up moving from MKL 10.x to MKL 11.0 beta, great work!

Best regards,

Giovanni

Victor_K_Intel1

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-02-2012
04:31 AM

193 Views

Actually, I am a little bit concerned by your statement

Despite CBWR feature can draw a veil over numerical stability issue it does not resolve it. Actually, the CBWR feature is destined to be used in situations when you know that your calculations are intentionally unstable and this is some kind of regularization method (like in ill-posed problems).

So, probably you have to investigate stability of your method (if possible).

Thanks

Victor

Azua_Garcia__Giovann

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-03-2012
05:51 AM

193 Views

Thank you for your support. Actually I'm working on an optimization algorithm which is iterative and converges depending on an epsilon threshold . The results in the Intel Core Duo worked consistently for all versions of this algorithm (with and without using MKL). However, when I moved to the i7 architecture I noticed the differences in number of iterations for the big problem sizes. Note that the algorithm would still converge but not with the exact same number of iterations for the big problem sizes. At the time I posted I also had an issue rooting from a broken Ubuntu kernel update.

After researching the issue,

Best regards,

Giovanni

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.