Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL ScaLAPACK fails on example

Matt_Thompson
Novice
953 Views
Folks, I'm having a bear of a time trying to figure out how to get MKL ScaLAPACK to work. I'm at such a loss I don't know if it's a bug or just a linking problem.

To wit, in order to track down this bug in my code, I tried to find a minimal example. The problem in the code I'm trying to use seems to be with PDSYEVX, so I downloaded sample_pdsyevx_call.f from the ScaLAPACK examples site. I then preceded to compile it (all system and user names munged for security):

> which mpif77
/opt/mpich2/ch3_ssm-intel/bin/mpif77
> mpif77 sample_pdsyevx_call.f -L/opt/intel/mkl/10.1.0.015/lib/em64t -I/opt/intel/mkl/10.1.0.015/include -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
> mpirun -np 4 ./a.out
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libmkl_scalapack_ 00002B79F0934B0F Unknown Unknown Unknown
rank 0 in job 10 master.MUNGED_33215 caused collective abort of all ranks
exit status of rank 0: return code 174

A failure. What I should get out is Matlab code. So, I then preceded to grab both BLACS and ScaLAPACK from netlib and compile them using Intel MKL BLAS and LAPACK. The builds went well and all tests seem to pass, so I compiled the example again:

> mpif77 sample_pdsyevx_call.f -L/home/USER/lib -L/opt/intel/mkl/10.1.0.015/lib/em64t -I/opt/intel/mkl/10.1.0.015/include -lscalapack -lf77blacs -lcblacs -lblacs -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
> mpirun -np 4 ./a.out
Z( 1, 1)= -0.485377931880384972D-01
Z( 2, 1)= -0.122222718667358496D+00
Z( 3, 1)= -0.282485135303397694D+00
Z( 4, 1)= 0.950214627337748530D+00
Z( 1, 2)= 0.912072227031436311D-01
Z( 2, 2)= 0.426620092092789616D+00
Z( 3, 2)= -0.877038303257524188D+00
Z( 4, 2)= -0.201197301593937478D+00
Z( 1, 3)= 0.495193043045525805D+00
Z( 2, 3)= -0.798642041261893643D+00
Z( 3, 3)= -0.298844131980118943D+00
Z( 4, 3)= -0.166273644422072014D+00
Z( 1, 4)= -0.862617629821305520D+00
Z( 2, 4)= -0.406482218545086649D+00
Z( 3, 4)= -0.248391118466086414D+00
Z( 4, 4)= -0.170190725350426952D+00
N = 4
A = hilb(N) + diag([1:-1/N:1/N])
W( 1 )= 0.304814036060551 ;
W( 2 )= 0.581961202142018 ;
W( 3 )= 0.908498110005126 ;
W( 4 )= 2.38091712798278 ;
backerror = A - Z * diag(W) * Z'
resid = A * Z - Z * diag(W)
ortho = Z' * Z - eye(N)
norm(backerror)
norm(resid)
norm(ortho)

Note that I've also tried using the non-sequential code with iomp5, everything. Also note that even though my home-built ScaLAPACK works with this example, it still fails with my "real" code. But I figured a good first step would be to figure out why MKL ScaLAPACK seems to fail on just this example.

Also, yes, I am using mpif77 and not mpiifort because our cluster was setup to use MPICH2, not Intel MPI. And we have many MPI codes working well on this cluster, it's just MKL ScaLAPACK that seems to fail us. (Also, my home-built ScaLAPACK was built with mpif77).
0 Kudos
10 Replies
Matt_Thompson
Novice
953 Views
I also just saw the Link Line advisor and it suggested the following static:

> mpif77 sample_pdsyevx_call.f $MKLPATH/libmkl_scalapack_lp64.a $MKLPATH/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a $MKLPATH/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -lpthread
> mpirun -np 4 ./a.out
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
a.out 0000000000414C03 Unknown Unknown Unknown
a.out 0000000000416A63 Unknown Unknown Unknown
a.out 000000000040871F Unknown Unknown Unknown
a.out 000000000040835C Unknown Unknown Unknown
libc.so.6 00000038C961D974 Unknown Unknown Unknown
a.out 0000000000408269 Unknown Unknown Unknown
rank 0 in job 8 master.MUNGED_36086 caused collective abort of all ranks
exit status of rank 0: return code 174

and dynamic:
> mpif77 sample_pdsyevx_call.f -L$MKLPATH -lmkl_scalapack_lp64 $MKLPATH/libmkl_solver_lp64_sequential.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -Wl,--end-group -lpthread
> mpirun -np 4 ./a.out
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libmkl_scalapack_ 00002BAA693E6B0F Unknown Unknown Unknown
rank 0 in job 9 master.MUNGED_36086 caused collective abort of all ranks
exit status of rank 0: killed by signal 9

0 Kudos
Gennady_F_Intel
Moderator
953 Views

Hmm,Interesting,
Can you successfully run the simplest example program ( example1.f ) from this examples?
What is the OS you are working for?
--Gennady
0 Kudos
Andrei_Moskalev__Int
953 Views

Which version of MKL do you use? In PDSYEVX exampleare calls of pdlaprnt.There was problem with pdalprnt function (till MKL 10.1 U2) which was succesefully resolved in MKL 10.1 U2 and MKL 10.2 Gold.
0 Kudos
Matt_Thompson
Novice
953 Views

Hmm,Interesting,
Can you successfully run the simplest example program ( example1.f ) from this examples?
What is the OS you are working for?
--Gennady

The OS is:
> cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)

As for example1.f, the answer is...yes!

> mpif77 example1.f -L/opt/intel/mkl/10.1.0.015/lib/em64t -I/opt/intel/mkl/10.1.0.015/include -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread
example1.f(65): (col. 12) remark: BLOCK WAS VECTORIZED.
example1.f(65): (col. 12) remark: BLOCK WAS VECTORIZED.
example1.f(65): (col. 12) remark: BLOCK WAS VECTORIZED.

> mpirun -np 6 ./a.out

ScaLAPACK Example Program #1 -- May 1, 1997

Solving Ax=b where A is a 9 by 9 matrix with a block size of 2
Running on 6 processes, where the process grid is 2 by 3

INFO code returned by PDGESV = 0

According to the normalized residual the solution is correct.

||A*x - b|| / ( ||x||*||A||*eps*N ) = 0.00000000E+00
0 Kudos
Matt_Thompson
Novice
953 Views
Quoting - amoskale

Which version of MKL do you use? In PDSYEVX exampleare calls of pdlaprnt.There was problem with pdalprnt function (till MKL 10.1 U2) which was succesefully resolved in MKL 10.1 U2 and MKL 10.2 Gold.

Well, the library is in /opt/intel/mkl/10.1.0.015, so it's not 10.2. Is there a way to figure out if this is MKL 10.1 U2? (Given the low sub-sub number in the directory, I'm doubting it.)
0 Kudos
Gennady_F_Intel
Moderator
953 Views
Quoting - thematt

Well, the library is in /opt/intel/mkl/10.1.0.015, so it's not 10.2. Is there a way to figure out if this is MKL 10.1 U2? (Given the low sub-sub number in the directory, I'm doubting it.)

1)Please see Announcement: Intel MKL 10.1 Update 2 is now available

Please pay attention on: "Users with current licenses may login at the Intel Registration Center to download."

2. >>> Is there a way to figure out if this is MKL 10.1 U2?

Yes, please open docmklsupport.txt file and you can find there Package ID.
The Package ID ( MKL 10.1 update 2 ) is 10.1.2.024
--Gennady

0 Kudos
Matt_Thompson
Novice
953 Views

1)Please see Announcement: Intel MKL 10.1 Update 2 is now available

Please pay attention on: "Users with current licenses may login at the Intel Registration Center to download."

2. >>> Is there a way to figure out if this is MKL 10.1 U2?

Yes, please open docmklsupport.txt file and you can find there Package ID.
The Package ID ( MKL 10.1 update 2 ) is 10.1.2.024
--Gennady


mklsupport.txt says (unsurprisingly): Package ID: l_mkl_p_10.1.0.015

So, I've asked our cluster's administrator to see if he can get us 10.1 Update 2. I'll reply here after I am able to test on that update.
0 Kudos
Matt_Thompson
Novice
953 Views
Quoting - amoskale

Which version of MKL do you use? In PDSYEVX exampleare calls of pdlaprnt.There was problem with pdalprnt function (till MKL 10.1 U2) which was succesefully resolved in MKL 10.1 U2 and MKL 10.2 Gold.

Our cluster admin was able to put 10.1 Update 2 on our cluster. To wit:

> mpif77 sample_pdsyevx_call.f -L/opt/intel/mkl/10.1.2.024/lib/em64t -I/opt/intel/mkl/10.1.2.024/include -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread

> mpirun -np 4 ./a.out
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libmkl_scalapack_ 00002B4AD5CE4B0F Unknown Unknown Unknown
rank 0 in job 4 master.MUNGED_51540 caused collective abort of all ranks
exit status of rank 0: return code 174

So, either this isn't the same bug, or the problem *wasn't* corrected in 10.1 Update 2.

0 Kudos
Matt_Thompson
Novice
953 Views
Quoting - amoskale

Which version of MKL do you use? In PDSYEVX exampleare calls of pdlaprnt.There was problem with pdalprnt function (till MKL 10.1 U2) which was succesefully resolved in MKL 10.1 U2 and MKL 10.2 Gold.

The answer seems to be that PDLAPRNT was not resolved in MKL 10.1 Update 2! If one comments out the two PDLAPRNT calls in the example, it at least runs to completion. It is, of course, broken in the sense that the resultant Matlab code is unusable since it needs the printed-out matrices.

Is there anyway to get a functional MKL ScaLAPACK at the moment then, if 10.1 Update 2 doesn't contain the fix?

ETA: Oh, and we suspect that many of the other problems we are having are due to a bad implementation of MPICH2. We are trying recompiled MPICH2 and, perhaps, use of Intel MPI to see if this fixes other bugs.
0 Kudos
Andrei_Moskalev__Int
953 Views
There is a workaround - you can compile pdlaprnt.f from netlib and put obj file in linking line before mkl libraries. It allows to use MKL ScaLAPACK functionality and avoid problems with pdlaprnt.

0 Kudos
Reply