Hi Vamsi,

Steven_V_ · ‎10-02-2012

Hi,

I ran into a problem with using the dcopy subroutine of MKL. When compiling and running the included small test using gfortran and intel MKL version 11, it works fine on my Xeon machines, but fails on the Opteron machines. Both the integer*4 and integer*8 versions of the 64-bit libraries of MKL seem to have this problem. ACML has the same issue, but only in their integer*4 64-bit library. The test program works fine with the older version 10.1.3.027 of MKL.

greetings, Steven

Gennady_F_Intel · ‎10-02-2012

Hi Steven, Which version of MKL do you use? I guess is that this is linux OS? How did you link the example? I noticed the task size is very big: PROGRAM TEST_DCOPY IMPLICIT REAL*8 (A-H,O-Z) ALLOCATABLE :: X(:), Y(:) DIMENSION N(2) N(1)=1220480320 ..... ALLOCATE(X(N(1))) ... ALLOCATE(Y(N(1))) ..... Do you have enough RAM on the systems where do you see the problem? --Gennady

Steven_V_ · ‎10-02-2012

Hi Gennady, this is on GNU/Linux 3.2.0-23 x86_64 SMP, GNU C Library (Ubuntu EGLIBC 2.15-0ubuntu10) with 512 GB of RAM. The test fails with MKL from composer_xe_2011_sp1.9.293 and composer_xe_2013.0.079, and only on opteron processors. I know the test is quite large, the actual dcopy is in a quantum chemistry code. However, even a 32-bit integer dcopy should handle 1.2 billion elements since it's a 64-bit library, so size_t can hold the number of bytes without a problem, as it did in the older version. Since it only fails on opteron, I guess it's somewhere in an architecture-specific routine.

Gennady_F_Intel · ‎10-02-2012

Thanks Steven. I am asking just to know what we need to check on our side. Yes, 1.2 10^9 should be handled by 32bit integer. one more question - did you link with libmkl_sequential.a or libmkl_gnu_thread.a ?

Steven_V_ · ‎10-02-2012

I linked with libmkl_sequential.a. I just tried the libmkl_gnu_thread.a now and it fails too.

Gennady_F_Intel · ‎10-02-2012

just for info: I am still couldn't find Opteron with such memory size. the test passed on Xeon with 32 Gb od RAM. ./test_dcopy 1097.52614198074 1097.52614198074

Steven_V_ · ‎10-03-2012

thanks for the info, this is also exactly what I get on Xeon E5630: ./test_dcopy 1097.52614198074 1097.52614198074 and this is what I get on Opteron 6276: ./test_dcopy 1097.52614198074 wrong, I = 146738497 X(I) = 3.141589835286140E-002 Y(I) = 1.12300002574921 Could you run the Opteron-specific code on a Xeon, or is that impossible? FYI, this is the discussion about the same issue, but with ACML: http://devgurus.amd.com/thread/159788

Steven_V_ · ‎10-03-2012

if necessary, I could try to find out how to give you access to one of our machines.

Gennady_F_Intel · ‎10-03-2012

thanks for suggestion, it's not necessarily - I have already received the same results on AMD Opteron(tm) Processor 6282 SE with 32 Gb of RAM. We will check what's wrong.

barragan_villanueva_ · ‎10-03-2012

Steven, You wrote that this problem is on AMD machines with MKL and ACML. Could you check this test on Netlib?

Steven_V_ · ‎10-03-2012

Hi Victor, with my system's blas (Ubuntu 12.04) it works fine, and I think that this is based on the netlib implementation. Also, our own program's blas is also based on netlib and that works fine too. (The older MKL 10.1 and ACML 4.2.0 work fine too). But if necessary, I can compile the netlib dcopy and test it.

Gennady_F_Intel · ‎10-03-2012

Steven, The problem is reproduced even when only is used DCOPY. I commented SQRT and DDOT and the problem is still exists: wrong, I = 146738497 X(I) = 3.141589835286140E-002 Y(I) = 1.12300002574921 the problem is escalated. we will let you know as soon as any update.

Gennady_F_Intel · ‎02-05-2013

Hello Steven,

Would you please check the latest 11.0. update2? the problem has been fixed there.

--Gennady

Steven_V_ · ‎03-26-2013

Hi,

I've installed version 11.0 update 2 of MKL and compiled my test program, linking with either -lmkl_sequential or -lmkl_gnu_thread, but in both cases the program segfaults, output from gdb attached.

Vamsi_S_Intel · ‎03-26-2013

Hi Steve,

Looking at the gdb bt log you attched, I notice that you are using the MKL ILP64 interface library (libmkl_gf_ilp64.so). When using the ilp64 interface library, the integers declared in the source program should be of 64-bit integer type. With gfortran, the relevant compiler flag which make sures integers are 64-bit length is -fdefault-integer-8. For Intel Fortran compiler, the correct option is -i8

Can you confirm that you are using -fdefault-integer-8 when compiling your program with gfortran and linking against the MKL gfortran ilp64 interface library?

--Vamsi.

Steven_V_ · ‎03-26-2013

Hi Vamsi,

thanks for catching that, I was too quick to test and indeed forgot that flag. Everything seems to work fine now, also in production the results match the Xeon machines. Thanks to everyone for your help.

greetings,

Steven

integer overflow in dcopy