Solved: Hi Alexander,

Georgios_S_ · ‎08-17-2015

I am performing a distributed Cholesky factorization and inversion of a matrix. I assume that the serial versions of these routines give the correct result, thus I am trying to make my distributed version work like the serial version when it comes to the result, by reading this thread.

I called mkl_cbwr_set(MKL_CBWR_AVX), but I got the very same result with and without it. Here are the last 5 cells of the result matrix:

distributed:

-549902542.265857 -1477292588.419721 2940319034.042153 -21964970840.793980 35339699997.924484
serial:
-549902542.266553 -1477292588.419761 2940319034.042350 -21964970840.794662 35339699997.925011

What should I do to receive the same results?

I also used mkl_malloc() with 16, 32 and 64 as the 2nd argument, but that didn't help! I am now realizing that the thread I have linked to, talks about reproducable results, something that's already OK in my case.

So, can I get the same results in the distributed version, as in the serial version, or that's not possible?

Kind regards,

George

Alexander_K_Intel3 · ‎08-19-2015

Hello George,

MKL LAPACK and ScaLAPACK implements a bit different algorithms (one is optimal for shared memory and another for distributed), so most likely results will be different due to round offs which happens in different order inside of algorithms.

Best recommendation is to check that matrices a numerically close to each other, like norm(A1-A2)/max(norm(A1),norm(A2))*size(A1)<C, where C is constant like 1.0E-14 (very accurate) or 1.0E-10(good enough) for double precision. Thus one do not need to rely on bitwise reproducibility, but will check actual mathematical properties. And this especially useful since variety of architectures and their properties will increase in future.

For validation purposes you also may try to use a matrix which doesn't require round offs to happen during computations, like one formed of integer numbers. Such a matrix with high portability will result bitwise same output, but there is still probability that after internal division a floating point number will appear which will later results a round off.

Best regards,
Alexander

View solution in original post

Alexander_K_Intel3 · ‎08-19-2015

Hello George,

MKL LAPACK and ScaLAPACK implements a bit different algorithms (one is optimal for shared memory and another for distributed), so most likely results will be different due to round offs which happens in different order inside of algorithms.

Best recommendation is to check that matrices a numerically close to each other, like norm(A1-A2)/max(norm(A1),norm(A2))*size(A1)<C, where C is constant like 1.0E-14 (very accurate) or 1.0E-10(good enough) for double precision. Thus one do not need to rely on bitwise reproducibility, but will check actual mathematical properties. And this especially useful since variety of architectures and their properties will increase in future.

For validation purposes you also may try to use a matrix which doesn't require round offs to happen during computations, like one formed of integer numbers. Such a matrix with high portability will result bitwise same output, but there is still probability that after internal division a floating point number will appear which will later results a round off.

Best regards,
Alexander

Georgios_S_ · ‎08-19-2015

Hi Alexander,

thanks for the fine answer. Which is the mathematical property you mentioned (i.e. norm(A1-A2)/max(norm(A1),norm(A2))*size(A1)<C)? I mean, how it is called?

Best regards,

George

Alexander_K_Intel3 · ‎08-20-2015

Normalized residual is name for the metric.
It basically normalizes largest difference between two elements and compares it against machine precision. If the difference within last few bits these matrices could be treated as equal.

Best regards,
Alexander.

Fix arithmetic error