SEGV in CLADIV ZLADIV CDOTC when building Scalapack with MKL on OS-X

davydden1 · ‎03-14-2016

Dear all, I am trying the recent free version of MKL on OS-X and get a segmentation fault when running unit tests of Scalapack: 65 - xcbrd (Failed) 66 - xzbrd (Failed) 69 - xchrd (Failed) 70 - xzhrd (Failed) 73 - xctrd (Failed) 74 - xztrd (Failed) 79 - xcsep (Failed) 80 - xzsep (Failed) 83 - xcgsep (Failed) 84 - xzgsep (Failed) 89 - xcevc (Failed) 93 - xcheevr (Failed) 94 - xzheevr (Failed) Configuration of Scalapack is quite straight forward: cmake .. -DCMAKE_INSTALL_PREFIX=/Users/davydden/.homebrew/Cellar/scalapack/2.0.2_3 -DBUILD_SHARED_LIBS=ON -DBUILD_TESTING=ON -DCMAKE_Fortran_FLAGS=-g -DBLAS_LIBRARIES:STRING=-L/opt/intel/mkl/lib -Wl,-rpath,/opt/intel/mkl/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -DLAPACK_LIBRARIES:STRING=-L/opt/intel/mkl/lib -Wl,-rpath,/opt/intel/mkl/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm Here is a breakdown of failing tests: ======= 65 - xcbrd (Failed) ========= Here is the error and the bt from lldb: Process 33412 stopped * thread #1: tid = 0x14ce59f, 0x0000000102a3932e libmkl_core.dylib`mkl_lapack_cladiv + 94, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10085f548) frame #0: 0x0000000102a3932e libmkl_core.dylib`mkl_lapack_cladiv + 94 libmkl_core.dylib`mkl_lapack_cladiv: -> 0x102a3932e <+94>: movsd %xmm1, (%r13) 0x102a39334 <+100>: addq $0x20, %rsp 0x102a39338 <+104>: popq %r13 0x102a3933a <+106>: retq (lldb) bt * thread #1: tid = 0x14ce59f, 0x0000000102a3932e libmkl_core.dylib`mkl_lapack_cladiv + 94, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10085f548) * frame #0: 0x0000000102a3932e libmkl_core.dylib`mkl_lapack_cladiv + 94 frame #1: 0x0000000100c6444d libmkl_intel_lp64.dylib`cladiv_ + 125 frame #2: 0x0000000100629069 libscalapack.dylib`pclarfg_(n=4, alpha=0.227123 + -0.427759i, iax=1, jax=1, x=_Complex float [] @ 0x0000000100011220, ix=2, jx=1, descx=int [] @ 0x00007fff5fbfecd0, incx=1, tau=_Complex float [] @ 0x00000001000113e0) + 2128 at pclarfg.f:280 frame #3: 0x00000001005f2398 libscalapack.dylib`pclabrd_(m=4, n=4, nb=2, a=_Complex float [] @ 0x0000000100011220, ia=1, ja=1, desca=int [] @ 0x00007fff5fbfecd0, d=float [] @ 0x0000000100011320, e=float [] @ 0x0000000100011380, tauq=_Complex float [] @ 0x00000001000113e0, taup=_Complex float [] @ 0x0000000100011440, x=_Complex float [] @ 0x00000001000114a0, ix=1, jx=1, descx=int [] @ 0x00007fff5fbfe740, y=_Complex float [] @ 0x00000001000114e0, iy=1, jy=1, descy=int [] @ 0x00007fff5fbfe710, work=_Complex float [] @ 0x0000000100011520) + 1856 at pclabrd.f:329 frame #4: 0x00000001005c7dfd libscalapack.dylib`pcgebrd_(m=4, n=4, a=_Complex float [] @ 0x0000000100011220, ia=1, ja=1, desca=int [] @ 0x00007fff5fbfecd0, d=float [] @ 0x0000000100011320, e=float [] @ 0x0000000100011380, tauq=_Complex float [] @ 0x00000001000113e0, taup=_Complex float [] @ 0x0000000100011440, work=_Complex float [] @ 0x00000001000114a0, lwork=22, info=0) + 2119 at pcgebrd.f:360 frame #5: 0x0000000100005e0d xcbrd`pcbrddriver + 6793 at pcbrddriver.f:362 frame #6: 0x0000000100007150 xcbrd`main(argc=1, argv="/private/tmp/scalapack20160314-14473-cdngpk/scalapack-2.0.2/build/Testing/xcbrd") + 54 at pcbrddriver.f:536 frame #7: 0x00007fff92f615ad libdyld.dylib`start + 1 As you see, the segmentation error is in the function CLADIV: (lldb) f 2 frame #2: 0x0000000100629069 libscalapack.dylib`pclarfg_(n=4, alpha=0.227123 + -0.427759i, iax=1, jax=1, x=_Complex float [] @ 0x0000000100011220, ix=2, jx=1, descx=int [] @ 0x00007fff5fbfecd0, incx=1, tau=_Complex float [] @ 0x00000001000113e0) + 2128 at pclarfg.f:280 277 ELSE 278 TAU( INDXTAU ) = CMPLX( ( BETA-ALPHR ) / BETA, 279 $ -ALPHI / BETA ) -> 280 ALPHA = CLADIV( CMPLX( ONE ), ALPHA-BETA ) 281 CALL PCSCAL( N-1, ALPHA, X, IX, JX, DESCX, INCX ) 282 ALPHA = BETA 283 END IF FYI, (lldb) f 1 frame #1: 0x0000000100c6444d libmkl_intel_lp64.dylib`cladiv_ + 125 libmkl_intel_lp64.dylib`cladiv_: 0x100c6444d <+125>: movsd 0xc8(%rsp), %xmm2 0x100c64456 <+134>: testl %r15d, %r15d 0x100c64459 <+137>: je 0x100c64411 ; <+65> 0x100c6445b <+139>: xorps %xmm1, %xmm1 As for other unit tests, the SEGV appears in: (lldb) f 2 frame #2: 0x00000001007eaa2c libscalapack.dylib`pzlarfg_(n=4, alpha=0.227123 + -0.427759i, iax=1, jax=1, x=_Complex double [] @ 0x0000000100011240, ix=2, jx=1, descx=int [] @ 0x00007fff5fbfecd0, incx=1, tau=_Complex double [] @ 0x0000000100011580) + 2356 at pzlarfg.f:281 278 ELSE 279 TAU( INDXTAU ) = DCMPLX( ( BETA-ALPHR ) / BETA, 280 $ -ALPHI / BETA ) -> 281 ALPHA = ZLADIV( DCMPLX( ONE ), ALPHA-BETA ) 282 CALL PZSCAL( N-1, ALPHA, X, IX, JX, DESCX, INCX ) 283 ALPHA = BETA 284 END IF and (lldb) f 2 frame #2: 0x000000010c04d5ba libscalapack.dylib`cvvdotc_(n=2, dot=0 + 0i, x=_Complex float [] @ 0x0000000100016c48, incx=1, y=_Complex float [] @ 0x0000000100387948, incy=1) + 95 at cvvdotc.f:63 60 * .. 61 * .. Executable Statements .. 62 * -> 63 DOT = DOT + CDOTC( N, X, INCX, Y, INCY ) 64 * 65 RETURN 66 * and a number of tests fail with MPI abort: -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- Can someone please try reproducing this? And if this is indeed a bug, it would nice to have a fix ;-) p.s. I am on El Capitan. p.p.s. It must be related to MKL as the same Scalapack compiles fine with Veclibfort.

davydden1 · ‎03-14-2016

Forgot to mention, $ gfortran --version GNU Fortran (Homebrew gcc 5.3.0 --without-multilib) 5.3.0 $ clang --version Apple LLVM version 7.0.2 (clang-700.1.81) and open-mpi/1.10.2.

Ying_H_Intel · ‎03-14-2016

Hi

Just wonder if the source code was in fortran interface?

If yes, the link command line, may need

-Wl,-rpath,/opt/intel/mkl/lib -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lpthread -lm

or
is it possible to use Intel fortran compiler to build the scalapack?

Best Regards,

Ying

davydden1 · ‎03-14-2016

Hi Ying,

Thanks for your prompt reply. I checked Link advisor on Linux with GNU fortran and indeed -lmkl_gf_lp64 should be used instead of -lmkl_intel_lp64.

The problem is: this lib is not there in OS-X MKL distribution! :-( Thus, i can not really check if this is indeed the reason for the segmentation fault.

Could you please pass the message for this lib to be included and the updated OS-X bundled to be created?

Otherwise it seems one can only use MKL for C, which is very unfortunate.

Sincerely, Denis.

Dmitry_B_Intel · ‎03-15-2016

Hi Denis,

MKL has not been supporting gfortran on OS X. Linux interface library libmkl_gf_lp64 differs from libmkl_intel_lp64 in the conventions how functions return complex numbers. Intel Fortran and gfortran have different conventions. Since you build ScaLAPACK on your own, you could try to tweak the build to make functions returning complex follow Intel conventions. For that, could you try to check if the gfortran you use supports -ff2c flag and if yes then add this flag for building ScaLAPACK and your application. You should link libmkl_intel_lp64 then.

Thanks
Dima

davydden1 · ‎03-15-2016

Hi Dima,

Thanks a lot for your suggestion. After doing a quick search I found out that one should use -ff2c -fno-second-underscore .

Unfortunately, i still have some unit test failures:

The following tests FAILED:
   77 - xssep (Timeout)
   79 - xcsep (Timeout)
   81 - xsgsep (Timeout)
   83 - xcgsep (Failed)
   91 - xssyevr (Timeout)
   93 - xcheevr (Timeout)
   95 - xshseqr (Timeout)

all unit tests are run with 4 MPI cores. At least xcgsep is related to Eigendecomposition routines:

SCALAPACK Hermitian Eigendecomposition routines.

' '

Running tests of the parallel generalized Hermitian eigenvalue routine: PCHEGVX.

A scaled residual check, will be computed

An explanation of the input/output parameters follows:

RESULT : passed; or an indication of which eigen request test failed

N : The number of rows and columns of the matrix A.

P : The number of process rows.

Q : The number of process columns.

NB : The size of the square blocks the matrix A is split into.

THRESH : If a residual value is less than THRESH, RESULT is flagged as PASSED.

: the QTQ norm is allowed to exceed THRESH for those eigenvectors

: which could not be reorthogonalized for lack of workspace.

TYP : matrix type (see pCGSEPtst.f).

IBTYPE : Generalized eigenproblem type (see pCHEGVx.f)

SUB : Subtests (see pCGSEPtst).f

CHK : The scaled residual

N NB P Q TYP IBTYPE SUB WALL CPU CHK CHECK

----- --- --- --- --- ------ --- -------- -------- --------- -----

'TEST 1 - test tiny matrices - different process configurations'

C

ISEED( 1 ) = 139

ISEED( 2 ) = 1139

ISEED( 3 ) = 2139

ISEED( 4 ) = 3139

UPLO= 'L'

SUBTESTS= 'N'

N= 0

NPROW= 1

NPCOL= 2

NB= 1

MATTYPE= 8

IBTYPE= 1

ABSTOL= 0.000000D+00

THRESH= 0.400000D+03

C

0 1 1 2 8 1 N 0.00 -1.00 NaN FAILED

[---the-rest-is-cut---]

At this point I don't know whether or not these NaNs in scaled residual are related to MKL. At least there are no segmentation faults... Will keep searching, but if you have any advices, i would be grateful.

Regards,

Denis.

davydden1 · ‎03-15-2016

p.s. during compilation i see a lot type mismatch warnings in test driver programs like:

/tmp/scalapack20160315-68539-ggklj5/scalapack-2.0.2/TESTING/LIN/psqrdriver.f:691:38:

$ MEM( IPPIV ) )

1

Warning: Type mismatch in argument 'ipiv' at (1); passed REAL(4) to INTEGER(4)

/tmp/scalapack20160315-68539-ggklj5/scalapack-2.0.2/TESTING/LIN/pdqrdriver.f:690:38:

$ MEM( IPPIV ) )

1

Warning: Type mismatch in argument 'ipiv' at (1); passed REAL(8) to INTEGER(4)

/tmp/scalapack20160315-68539-ggklj5/scalapack-2.0.2/TESTING/LIN/pcqrdriver.f:709:38:

$ MEM( IPPIV ) )

1

Warning: Type mismatch in argument 'ipiv' at (1); passed COMPLEX(4) to INTEGER(4)

/tmp/scalapack20160315-68539-ggklj5/scalapack-2.0.2/TESTING/LIN/pzqrdriver.f:709:38:

$ MEM( IPPIV ) )

1

Warning: Type mismatch in argument 'ipiv' at (1); passed COMPLEX(8) to INTEGER(4)

/tmp/scalapack20160315-68539-ggklj5/scalapack-2.0.2/PBLAS/TESTING/pzblas1tst.f:227:41:

$ NPROCS, ALPHA, MEM )

1

Warning: Type mismatch in argument 'work' at (1); passed COMPLEX(8) to INTEGER(4)

/tmp/scalapack20160315-68539-ggklj5/scalapack-2.0.2/BLACS/TESTING/blacstest.f:2818:30:

$ MEM(ERRIPTR), MEM(ERRDPTR) )

1

Warning: Type mismatch in argument 'erribuf' at (1); passed REAL(4) to INTEGER(4)

davydden1 · ‎03-15-2016

On Ubuntu with system provided Blas/Lapack it is all good and all tests pass if I do not specify -ff2c -fno-second-underscore flags. Otherwise it is broken.

Ying_H_Intel · ‎03-15-2016

Hi Denis,

Your last comments "On Ubuntu with system provided Blas/Lapack" , so is it another issue, not related to intel MKL,right?

"if I do not specify -ff2c -fno-second-underscore flags", is it with gfortran.

Regarding the original issue on OS X, beside of the -ff2c -fno-second-underscore , here are other two ways for your reference.

1. try Intel fotran compiler, https://software.intel.com/en-us/intel-compilers/, which should provide 30-days free try.

2) Possible workaround is to compile CLADIV, CDOTU, CDOTC, ZLADIV, ZDOTU, ZDOTC, xxx from NETLIB using GFortran and link these functions before Intel MKL. So that all functions expecting complex values are not calling MKL, but using a code compiled with GFortran.

Best Regards

Ying

davydden1 · ‎03-15-2016

Hi Ying,

I mentioned Ubuntu only to make a point that the problem is not in Scalapack itself as it does compile fine and all tests pass.

Thanks for your suggestions. I will think about it.

Regards, Denis.

davydden1 · ‎03-16-2016

I just realised that Scalapack and Blacs are also provided within the MKL OS-X bundle. Given the discussion in the above, could there be any issues when using those with GNU Fortran + Clang?

Ying_H_Intel · ‎03-17-2016

Hi Denis,

What function in scalapack do you need to use?

We have not been supporting gfortran on OS X for years, so just speculate:

In most of case, GFortran + Intel MKL with libmkl_scalapack_lp64 and libmkl_intel_lp64 linked) should work unless you will call an MKL function which returns a complex value. Such a call may cause a crash or incorrect behavior.

Most of BLAS, LAPACK, ScaLAPACK functions are procedures which return no any value, so there will be no problems with such problem. And the problem with those functions which return a complex value could be workarounded by calling their C counterparts, like cblas_cdotu instead of cdotu, or LAPACKE_zlange instead of zlange.

Best Regards,

Ying

davydden1 · ‎03-17-2016

Hi Ying,

I don't use ScaLAPACK directly, i compile other libraries which are used in other libraries which are used in other libraries which I use. So it's like ScaLAPACK -> MUMPS -> PETSc & Trilinos -> deal.II -> my code.

I guess in this usage scenario I can only hope that i never deal with complex numbers.. :-(

Regards, Denis.

Dmitry_B_Intel · ‎03-17-2016

HI Denis,

Option -ff2c is not a solution indeed, because it makes Fortran functions declared as real be interpreted as returning double. That explains failures of single-precision tests. I am sorry for having misguided you.

It looks like gfortran on OS X can be used with MKL but in a very limited way. I therefore second Ying's suggestion " try Intel fotran compiler, https://software.intel.com/en-us/intel-compilers/".

Thanks
Dima