Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL scalapack and example1.f

semyon
Beginner
916 Views
Hello!

I'm trying to start using MKL Scalapack and have downloaded example1.f from http://www.netlib.org/scalapack/examples/.
But it failes to start both on gfortran and ifort:

If mpif77 points to ifort:

$ mpif77 -g example1.f -L$MKL_LIB -lmkl_blacs -lmkl_scalapack_core -lmkl_core -lmkl_intel -lmkl_blacs_openmpi -lmkl_intel_thread -liomp5
$ mpirun -n 2 a.out
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libmpi.so.0 B7A76097 Unknown Unknown Unknown
a.out 080559E0 Unknown Unknown Unknown
a.out 08055991 Unknown Unknown Unknown
libc.so.6 B75C0A4C Unknown Unknown Unknown
a.out 080558A1 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libmpi.so.0 B7B00097 Unknown Unknown Unknown
a.out 080559E0 Unknown Unknown Unknown
a.out 08055991 Unknown Unknown Unknown
libc.so.6 B764AA4C Unknown Unknown Unknown
a.out 080558A1 Unknown Unknown Unknown
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 21960 on
node pc7229 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Also with gfortran:

$ mpif77 -g example1.f -L$MKL_LIB -lmkl_blacs -lmkl_scalapack_core -lmkl_core -lmkl_gf -lmkl_blacs_openmpi -lmkl_gnu_thread -liomp5
$ mpirun -n 2 a.out
[pc7229:21994] *** Process received signal ***
[pc7229:21995] *** Process received signal ***
[pc7229:21994] Signal: Segmentation fault (11)
[pc7229:21994] Signal code: Address not mapped (1)
[pc7229:21994] Failing at address: 0xcf
[pc7229:21994] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 21994 on node pc7229 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I'm using ifort 10.0.026, gfortran 4.4.2, openmpi 1.3.3, Intel MKL 10.0.5.025.
Where can be the problem?
0 Kudos
4 Replies
TimP
Honored Contributor III
916 Views
You haven't told us anything about your results from basic investigation. What do you see when you run with a debugger? Is it stack overflow, either main stack or thread stack? What happens with OMP_NUM_THREADS=1 or with mpirun -np 1 ?
Attempting to run in openmp/mpi hybrid mode with 32-bit builds and an obsolete ifort may be more trouble than most people wish to undertake. Does openmpi have any facility for that, other than running 1 process per node? libiomp5 would likely try to take all logical processors on the node, unless you pass a distinct KMP_AFFINITY to each process.

0 Kudos
semyon
Beginner
916 Views
Quoting - tim18
You haven't told us anything about your results from basic investigation. What do you see when you run with a debugger? Is it stack overflow, either main stack or thread stack? What happens with OMP_NUM_THREADS=1 or with mpirun -np 1 ?
Attempting to run in openmp/mpi hybrid mode with 32-bit builds and an obsolete ifort may be more trouble than most people wish to undertake. Does openmpi have any facility for that, other than running 1 process per node? libiomp5 would likely try to take all logical processors on the node, unless you pass a distinct KMP_AFFINITY to each process.


Usually I write/test my programs on 1 processor computer (with 1 core) and then push them to multicore/multiprocessor systems. So previous examples were run on my 1 processor computer.
If I use debugger:

$ cat gdb_comm
run
backtrace
$ mpirun -n 2 gdb -x gdb_comm a.out
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-pc-linux-gnu"...
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-pc-linux-gnu"...
[Thread debugging using libthread_db enabled]
[Thread debugging using libthread_db enabled]
[New Thread 0xb78496c0 (LWP 22625)]
[New Thread 0xb784d6c0 (LWP 22624)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb784d6c0 (LWP 22624)]
0xb7bdb097 in PMPI_Comm_size () from /usr/lib/libmpi.so.0
#0 0xb7bdb097 in PMPI_Comm_size () from /usr/lib/libmpi.so.0
#1 0xb7b95044 in ?? () from /usr/lib/libmpi.so.0
#2 0xbfd6c014 in ?? ()
#3 0xb7fe2ca0 in ?? () from /lib/ld-linux.so.2
#4 0xbfd6bb78 in ?? ()
#5 0xbfd6bb88 in ?? ()
#6 0xb7fd9c10 in ?? () from /lib/ld-linux.so.2

#7 0x08050d1e in blacs_pinfo__ ()
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb78496c0 (LWP 22625)]
0xb7bd7097 in PMPI_Comm_size () from /usr/lib/libmpi.so.0
#8 0x080a6180 in npcol.1520 ()
#0 0xb7bd7097 in PMPI_Comm_size () from /usr/lib/libmpi.so.0
#1 0xb7b91044 in ?? () from /usr/lib/libmpi.so.0
#9 0x08d4ce08 in ?? ()
#2 0xbfe596b4 in ?? ()
#10 0xbfd6bb54 in ?? ()
#3 0xb7fdeca0 in ?? () from /lib/ld-linux.so.2
#11 0xbfd6bb78 in ?? ()
#4 0xbfe59218 in ?? ()
---Type to continue, or q to quit---#5 0xbfe59228 in ?? ()
#6 0xb7fd5c10 in ?? () from /lib/ld-linux.so.2
#7 0x08050d1e in blacs_pinfo__ ()
#8 0x080a6180 in npcol.1520 ()
#9 0x09434e10 in ?? ()
#10 0xbfe591f4 in ?? ()
#11 0xbfe59218 in ?? ()
---Type to continue, or q to quit---#12 0x00000004 in ?? ()
#13 0x00000000 in ?? ()
(gdb) quit
The program is running. Exit anyway? (y or n) [answered Y; input not from terminal]
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 22623 on
node pc7229 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Running with '-n 1' gives the same error in PMPI_Comm_size ().

I also tried to run the program on 2-core computer (here gfortran 4.4.1, mkl 11.1.059 were used). It leads to another error:

semyon@kepler ~/scalapack $ mpif77 -g example1.f -L$MKL_LIB -lmkl_blacs -lmkl_scalapack_core -lmkl_core -lmkl_gf -lmkl_blacs_openmpi -lmkl_gnu_thread -liomp5
semyon@kepler ~/scalapack $ mpirun -n 2 a.out
[kepler:08163] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1
[kepler:08164] MPI_ABORT invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1
semyon@kepler ~/scalapack $ mpirun -n 1 a.out
[kepler:08807] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1

This error occurs in CALL BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL )

0 Kudos
mfactor
Beginner
916 Views
The example supposes that you use at least a 2x3 grid (check the npcol and nprow variables). Try

mpirun -np 6 ./example

and sees if works.
0 Kudos
semyon
Beginner
916 Views
Quoting - mfactor
The example supposes that you use at least a 2x3 grid (check the npcol and nprow variables). Try

mpirun -np 6 ./example

and sees if works.

Yes it's really so! Thanks.
It works on our 2-core computer. But running on 1-core computer still fails with segfault (but maybe its a problem of interaction of MKL with old hardware).
0 Kudos
Reply