Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
7234 Discussions

Problem when solving large system using Scalapack PDGESV

daren__wall
Beginner
1,543 Views

A parallel fortran code that solves a set of linear simultaneous equations Ax = b using the scalapack routine PDGESV fails (exiting with segmentation fault) when the no. of equations, N,  becomes large.  I have not identified the exact value of N at which problems arise, but, for example, the code works for all the values I have tested up to N= 50000, but fails at N=94423.

In particular, the failure appears to occur during the call to the scalapack routine (i.e. not when allocating / deallocating memory);
it enters routine PDGESV, but does not leave this routine.

I have prepared a simple small Fortran example code (see attachment below) that exhibits this problem.  This code simply 1) allocates space for the matrix A and vector b, 2) fills their entries with random entries 3) calls PDGESV and then 4) deallocates the memory. The code has been tested on a variety of different matrix sizes (NxN) and with various BLACS processor arrays without any errors until N becomes large. 

The problem does not seem to be a problem with lack of memory; on the machine I execute the code 192 GB is available,

whereas the code only uses 65 GB when N=94423. I have tried using the 'ulimit -s unlimited' command , but this did not resolve the problem. My feeling is that instead there is some problem with maybe exceeding some default limit on what memory is available to a single process in mpi? i.e. perhaps I am simply missing some appropriate FLAGS at compilation / run time?

I am running the program on a linux cluster using  Red Hat Enterprise Linux Server release 7.3 (Maipo)

I compiled the following code with:

mpiifort -mcmodel=medium    -m64  -mkl=cluster  -o para.exe  solve_by_lu_parallelmpi_simple_light2.for

 

and run it using (for example when N= 9445)

mpiexec.hydra  -n 4 ./para.exe  9445 2 2 32

the command line arguments here denote selecting N=9445 and using a 2x2 BLACS process array with block size 32

For this smaller matrix size the program runs w/out any problems producing the output

WE ARE SOLVING A SYSTEM OF         9445  LINEAR EQUATIONS
 PROC:            0           0 HAS  MLOC, NLOC =        4736        4736
 PROC:            0           0  ALLOCATING SPACE ...
 PROC:            0           0  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            0           1 HAS  MLOC, NLOC =        4736        4709
 PROC:            0           1  ALLOCATING SPACE ...
 PROC:            0           1  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           0 HAS  MLOC, NLOC =        4709        4736
 PROC:            1           0  ALLOCATING SPACE ...
 PROC:            1           0  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           1 HAS  MLOC, NLOC =        4709        4709
 PROC:            1           1  ALLOCATING SPACE ...
 PROC:            1           1  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           1
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            1           0
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            0           1
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            0           0
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 
 INFO code returned by PDGESV =            0

SO far so good. But when I try to solve a larger system using

mpiexec.hydra -n $NUM_PROCS ./para.exe  9445 2 2 32

the program crashes during the call to PDGESV with the output

WE ARE SOLVING A SYSTEM OF        94423  LINEAR EQUATIONS
 PROC:            0           0 HAS  MLOC, NLOC =       47223       47223
 PROC:            0           0  ALLOCATING SPACE ...
 PROC:            0           0  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            0           1 HAS  MLOC, NLOC =       47223       47200
 PROC:            0           1  ALLOCATING SPACE ...
 PROC:            0           1  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           0 HAS  MLOC, NLOC =       47200       47223
 PROC:            1           0  ALLOCATING SPACE ...
 PROC:            1           1 HAS  MLOC, NLOC =       47200       47200
 PROC:            1           1  ALLOCATING SPACE ...
 PROC:            1           0  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           1  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            0           1
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            0           0
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            1           1
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            1           0
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..


forrtl: 致命的なエラー (154): 配列インデックスが境界外です。
Image              PC                Routine            Line        Source             
libifcore.so.5     00002B0D716C19AF  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002B0D712335D0  Unknown               Unknown  Unknown
libmkl_avx512.so   00002B11A45E5A47  mkl_blas_avx512_x     Unknown  Unknown
libmkl_intel_lp64  00002B0D68E8BB55  dger_                 Unknown  Unknown
libmkl_scalapack_  00002B0D69F972AE  pdger_                Unknown  Unknown
libmkl_scalapack_  00002B0D69E53541  pdgetf3_              Unknown  Unknown
libmkl_scalapack_  00002B0D69E53688  pdgetf3_              Unknown  Unknown
libmkl_scalapack_  00002B0D69C2E13B  pdgetf2_              Unknown  Unknown
libmkl_scalapack_  00002B0D69C2E836  pdgetrf2_             Unknown  Unknown
libmkl_scalapack_  00002B0D6A014F6E  pdgetrf_              Unknown  Unknown
libmkl_scalapack_  00002B0D69C29C7D  pdgesv_               Unknown  Unknown
para.exe           0000000000401F8C  Unknown               Unknown  Unknown
para.exe           00000000004011BE  Unknown               Unknown  Unknown
libc-2.17.so       00002B0D73DFC3D5  __libc_start_main     Unknown  Unknown
para.exe           00000000004010C9  Unknown               Unknown  Unknown

the first error line beginning forrtl: can be translated as

forrtl: Fatal error (154): Array index out of bounds.

The problem seems to be ocurring somewhere in the scalapack routines.

Does anyone have any recommendations / possible solutions ?

 Any advice or pointers will be gratefully received,

     Many thanks,

             Dan.

 

 

0 Kudos
4 Replies
Gennady_F_Intel
Moderator
1,543 Views

please try to link with ILP64 API and recheck the behavior on your side

0 Kudos
daren__wall
Beginner
1,543 Views

Hi there,

I have now compiled instead with
 mpiifort -mcmodel=medium    -m64 -ilp64   -mkl=cluster  -o para.exe  solve_by_lu_parallelmpi_simple_light2.for

but unfortunately seem to get  a similar  error (again the error occurs somewhere within the call to PDGESV):


WE ARE SOLVING A SYSTEM OF        94423  LINEAR EQUATIONS
 PROC:            0           0 HAS  MLOC, NLOC =       47223       47223
 PROC:            0           0  ALLOCATING SPACE ...
 PROC:            0           1 HAS  MLOC, NLOC =       47223       47200
 PROC:            0           1  ALLOCATING SPACE ...
 PROC:            1           0 HAS  MLOC, NLOC =       47200       47223
 PROC:            1           0  ALLOCATING SPACE ...
 PROC:            1           1 HAS  MLOC, NLOC =       47200       47200
 PROC:            1           1  ALLOCATING SPACE ...
 PROC:            1           0  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            1           1  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            0           0  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            0           1  CONSTRUCTING MATRIX A AND RHS VECTOR B ...
 PROC:            0           0
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            0           1
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            1           1
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
 PROC:            1           0
 NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
forrtl: 致命的なエラー (154): 配列インデックスが境界外です。
Image              PC                Routine            Line        Source             
libifcore.so.5     00002B009E4AC9AF  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002B009E01E5D0  Unknown               Unknown  Unknown
libmkl_avx512.so   00002B04D38E0A47  mkl_blas_avx512_x     Unknown  Unknown
libmkl_intel_lp64  00002B0095A41B55  dger_                 Unknown  Unknown
libmkl_scalapack_  00002B0096B4D2AE  pdger_                Unknown  Unknown
libmkl_scalapack_  00002B0096A09541  pdgetf3_              Unknown  Unknown
libmkl_scalapack_  00002B0096A09688  pdgetf3_              Unknown  Unknown
libmkl_scalapack_  00002B00967E413B  pdgetf2_              Unknown  Unknown
libmkl_scalapack_  00002B00967E4836  pdgetrf2_             Unknown  Unknown
libmkl_scalapack_  00002B0096BCAF6E  pdgetrf_              Unknown  Unknown
libmkl_scalapack_  00002B00967DFC7D  pdgesv_               Unknown  Unknown
para.exe           0000000000401F9C  Unknown               Unknown  Unknown
para.exe           00000000004011CE  Unknown               Unknown  Unknown
libc-2.17.so       00002B00A0BE73D5  __libc_start_main     Unknown  Unknown
para.exe           00000000004010D9  Unknown               Unknown  Unknown

 

0 Kudos
Gennady_F_Intel
Moderator
1,543 Views

this is not exactly what I mean when asked to check if the problem exists with ilp64 API. Please take a look whet mkl linker adviser will suggest how to properly link with ilp64 cases.

0 Kudos
daren__wall
Beginner
1,543 Views

Many thanks for your suggestion re: the mkl link adviser.

There were a few possible choices of how to link the code; I had an idea that dynamic linking with

openMP threading may be the best option, but I compiled and executed a number of possible options (10 in all).

The good news I can report is that actually all ten choices listed below led to successful execution ; problem solved!

I write the actual compilation commands below along with the execution wall time in case they should be 

of interest to other programmers.  The conclusions are that openMP (unsurprisingly) offers a significant speedup over sequential ,

 a dynamically linked code will slightly outperform a statically linked code all other options being the same.

For those  intending to call PDGESV from their codes, I believe the fortran program attached above makes a good compact scalable test program, please use it freely. 

Many thanks once again for you assistance, it is much appreciated. Perhaps you could offer a concise sentence

just to explain why the different linking options used below, as suggested by the link adviser, led to a resolution of the problem

- is it fair to say it was a large integer problem ?

---------------------------------------------------------------------------------

Compilation and execution times (using intel compiler version 18.3):

[1]   (we add -mcmodel=medium to the link adviser suggestion, dynamic linking, openmp and explicit linking to mkl )

Execution wall clock time: 18 mins  5 secs
 mpiifort -mcmodel=medium      -i8 -I${MKLROOT}/include  -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl  -o para01.exe  solve_by_lu_parallelmpi_simple_light2.for     -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl

 

[2]  (we  add mcmodel and m64; dynamic linking , openmp and explicit linking to mkl )

Execution wall clock time: 18 mins 2 secs

mpiifort -mcmodel=medium    -m64   -i8 -I${MKLROOT}/include  -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl  -o para02.exe  solve_by_lu_parallelmpi_simple_light2.for     -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl

 

[3]
 (we only add mcmodel; static linking, openMP and linking explicitly with mkl libraries):

Execution wall clock time: 18 mins 33 secs

 mpiifort  -mcmodel=medium   -i8 -I${MKLROOT}/include  ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl  -o para03.exe  solve_by_lu_parallelmpi_simple_light2.for  ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

[4] (we only add mcmodel and m64; static linking, openMP and linking explicitly with mkl libraries):

Execution wall clock time: 18 mins 33 secs
 mpiifort  -mcmodel=medium -m64    -i8 -I${MKLROOT}/include  ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl  -o para04.exe  solve_by_lu_parallelmpi_simple_light2.for  ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl  

[5]

(just mcmodel added ; dynamic linking , sequential (no openmp) :

Execution wall clock time: 56 mins 15 secs

mpiifort   -mcmodel=medium   -i8 -I${MKLROOT}/include  -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -ldl -o para05.exe  solve_by_lu_parallelmpi_simple_light2.for -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -ldl


[6]

(we only add mcmodel and m64; dynamic linking , sequential (no openmp):

Execution wall clock time: timing 56 mins 10 secs

mpiifort   -mcmodel=medium  -m64  -i8 -I${MKLROOT}/include  -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -ldl -o para06.exe  solve_by_lu_parallelmpi_simple_light2.for -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -ldl


[7]

(just mcmodel added) static linking sequential (no openmp):

Execution wall clock time:  1 hour 5 mins

 mpiifort    -mcmodel=medium   -i8 -I${MKLROOT}/include  ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -ldl -o para07.exe  solve_by_lu_parallelmpi_simple_light2.for   ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -ldl

[8] (just mcmodel and m64 added; static linking sequential (no openmp):

Execution wall clock time: 1 hour 5 mins 

mpiifort    -mcmodel=medium -m64   -i8 -I${MKLROOT}/include  ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -ldl -o para08.exe  solve_by_lu_parallelmpi_simple_light2.for   ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -ldl

[9]
(no additions by me; dynamic link, openmp no mcmodel):

Execution wall clock time: 18 mins 3 secs


 mpiifort    -i8 -I${MKLROOT}/include  -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl  -o para09.exe  solve_by_lu_parallelmpi_simple_light2.for     -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl

[10]

(no additions by me;  static link, openmp  no mcmodel) :

Execution wall clock time: 18 min 30 secs

mpiifort    -i8 -I${MKLROOT}/include  ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl  -o para10.exe  solve_by_lu_parallelmpi_simple_light2.for  ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

 

 

0 Kudos
Reply