- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A parallel fortran code that solves a set of linear simultaneous equations Ax = b using the scalapack routine PDGESV fails (exiting with segmentation fault) when the no. of equations, N, becomes large. I have not identified the exact value of N at which problems arise, but, for example, the code works for all the values I have tested up to N= 50000, but fails at N=94423.
In particular, the failure appears to occur during the call to the scalapack routine (i.e. not when allocating / deallocating memory);
it enters routine PDGESV, but does not leave this routine.
I have prepared a simple small Fortran example code (see attachment below) that exhibits this problem. This code simply 1) allocates space for the matrix A and vector b, 2) fills their entries with random entries 3) calls PDGESV and then 4) deallocates the memory. The code has been tested on a variety of different matrix sizes (NxN) and with various BLACS processor arrays without any errors until N becomes large.
The problem does not seem to be a problem with lack of memory; on the machine I execute the code 192 GB is available,
whereas the code only uses 65 GB when N=94423. I have tried using the 'ulimit -s unlimited' command , but this did not resolve the problem. My feeling is that instead there is some problem with maybe exceeding some default limit on what memory is available to a single process in mpi? i.e. perhaps I am simply missing some appropriate FLAGS at compilation / run time?
I am running the program on a linux cluster using Red Hat Enterprise Linux Server release 7.3 (Maipo)
I compiled the following code with:
mpiifort -mcmodel=medium -m64 -mkl=cluster -o para.exe solve_by_lu_parallelmpi_simple_light2.for
and run it using (for example when N= 9445)
mpiexec.hydra -n 4 ./para.exe 9445 2 2 32
the command line arguments here denote selecting N=9445 and using a 2x2 BLACS process array with block size 32
For this smaller matrix size the program runs w/out any problems producing the output
WE ARE SOLVING A SYSTEM OF 9445 LINEAR EQUATIONS
PROC: 0 0 HAS MLOC, NLOC = 4736 4736
PROC: 0 0 ALLOCATING SPACE ...
PROC: 0 0 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 0 1 HAS MLOC, NLOC = 4736 4709
PROC: 0 1 ALLOCATING SPACE ...
PROC: 0 1 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 1 0 HAS MLOC, NLOC = 4709 4736
PROC: 1 0 ALLOCATING SPACE ...
PROC: 1 0 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 1 1 HAS MLOC, NLOC = 4709 4709
PROC: 1 1 ALLOCATING SPACE ...
PROC: 1 1 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 1 1
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
PROC: 1 0
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
PROC: 0 1
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
PROC: 0 0
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
INFO code returned by PDGESV = 0
SO far so good. But when I try to solve a larger system using
mpiexec.hydra -n $NUM_PROCS ./para.exe 9445 2 2 32
the program crashes during the call to PDGESV with the output
WE ARE SOLVING A SYSTEM OF 94423 LINEAR EQUATIONS
PROC: 0 0 HAS MLOC, NLOC = 47223 47223
PROC: 0 0 ALLOCATING SPACE ...
PROC: 0 0 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 0 1 HAS MLOC, NLOC = 47223 47200
PROC: 0 1 ALLOCATING SPACE ...
PROC: 0 1 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 1 0 HAS MLOC, NLOC = 47200 47223
PROC: 1 0 ALLOCATING SPACE ...
PROC: 1 1 HAS MLOC, NLOC = 47200 47200
PROC: 1 1 ALLOCATING SPACE ...
PROC: 1 0 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 1 1 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 0 1
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
PROC: 0 0
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
PROC: 1 1
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
PROC: 1 0
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
forrtl: 致命的なエラー (154): 配列インデックスが境界外です。
Image PC Routine Line Source
libifcore.so.5 00002B0D716C19AF for__signal_handl Unknown Unknown
libpthread-2.17.s 00002B0D712335D0 Unknown Unknown Unknown
libmkl_avx512.so 00002B11A45E5A47 mkl_blas_avx512_x Unknown Unknown
libmkl_intel_lp64 00002B0D68E8BB55 dger_ Unknown Unknown
libmkl_scalapack_ 00002B0D69F972AE pdger_ Unknown Unknown
libmkl_scalapack_ 00002B0D69E53541 pdgetf3_ Unknown Unknown
libmkl_scalapack_ 00002B0D69E53688 pdgetf3_ Unknown Unknown
libmkl_scalapack_ 00002B0D69C2E13B pdgetf2_ Unknown Unknown
libmkl_scalapack_ 00002B0D69C2E836 pdgetrf2_ Unknown Unknown
libmkl_scalapack_ 00002B0D6A014F6E pdgetrf_ Unknown Unknown
libmkl_scalapack_ 00002B0D69C29C7D pdgesv_ Unknown Unknown
para.exe 0000000000401F8C Unknown Unknown Unknown
para.exe 00000000004011BE Unknown Unknown Unknown
libc-2.17.so 00002B0D73DFC3D5 __libc_start_main Unknown Unknown
para.exe 00000000004010C9 Unknown Unknown Unknown
the first error line beginning forrtl: can be translated as
forrtl: Fatal error (154): Array index out of bounds.
The problem seems to be ocurring somewhere in the scalapack routines.
Does anyone have any recommendations / possible solutions ?
Any advice or pointers will be gratefully received,
Many thanks,
Dan.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
please try to link with ILP64 API and recheck the behavior on your side
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I have now compiled instead with
mpiifort -mcmodel=medium -m64 -ilp64 -mkl=cluster -o para.exe solve_by_lu_parallelmpi_simple_light2.for
but unfortunately seem to get a similar error (again the error occurs somewhere within the call to PDGESV):
WE ARE SOLVING A SYSTEM OF 94423 LINEAR EQUATIONS
PROC: 0 0 HAS MLOC, NLOC = 47223 47223
PROC: 0 0 ALLOCATING SPACE ...
PROC: 0 1 HAS MLOC, NLOC = 47223 47200
PROC: 0 1 ALLOCATING SPACE ...
PROC: 1 0 HAS MLOC, NLOC = 47200 47223
PROC: 1 0 ALLOCATING SPACE ...
PROC: 1 1 HAS MLOC, NLOC = 47200 47200
PROC: 1 1 ALLOCATING SPACE ...
PROC: 1 0 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 1 1 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 0 0 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 0 1 CONSTRUCTING MATRIX A AND RHS VECTOR B ...
PROC: 0 0
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
PROC: 0 1
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
PROC: 1 1
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
PROC: 1 0
NOW SOLVING SYSTEM AX = B USING SCALAPACK PDGESV ..
forrtl: 致命的なエラー (154): 配列インデックスが境界外です。
Image PC Routine Line Source
libifcore.so.5 00002B009E4AC9AF for__signal_handl Unknown Unknown
libpthread-2.17.s 00002B009E01E5D0 Unknown Unknown Unknown
libmkl_avx512.so 00002B04D38E0A47 mkl_blas_avx512_x Unknown Unknown
libmkl_intel_lp64 00002B0095A41B55 dger_ Unknown Unknown
libmkl_scalapack_ 00002B0096B4D2AE pdger_ Unknown Unknown
libmkl_scalapack_ 00002B0096A09541 pdgetf3_ Unknown Unknown
libmkl_scalapack_ 00002B0096A09688 pdgetf3_ Unknown Unknown
libmkl_scalapack_ 00002B00967E413B pdgetf2_ Unknown Unknown
libmkl_scalapack_ 00002B00967E4836 pdgetrf2_ Unknown Unknown
libmkl_scalapack_ 00002B0096BCAF6E pdgetrf_ Unknown Unknown
libmkl_scalapack_ 00002B00967DFC7D pdgesv_ Unknown Unknown
para.exe 0000000000401F9C Unknown Unknown Unknown
para.exe 00000000004011CE Unknown Unknown Unknown
libc-2.17.so 00002B00A0BE73D5 __libc_start_main Unknown Unknown
para.exe 00000000004010D9 Unknown Unknown Unknown
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
this is not exactly what I mean when asked to check if the problem exists with ilp64 API. Please take a look whet mkl linker adviser will suggest how to properly link with ilp64 cases.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks for your suggestion re: the mkl link adviser.
There were a few possible choices of how to link the code; I had an idea that dynamic linking with
openMP threading may be the best option, but I compiled and executed a number of possible options (10 in all).
The good news I can report is that actually all ten choices listed below led to successful execution ; problem solved!
I write the actual compilation commands below along with the execution wall time in case they should be
of interest to other programmers. The conclusions are that openMP (unsurprisingly) offers a significant speedup over sequential ,
a dynamically linked code will slightly outperform a statically linked code all other options being the same.
For those intending to call PDGESV from their codes, I believe the fortran program attached above makes a good compact scalable test program, please use it freely.
Many thanks once again for you assistance, it is much appreciated. Perhaps you could offer a concise sentence
just to explain why the different linking options used below, as suggested by the link adviser, led to a resolution of the problem
- is it fair to say it was a large integer problem ?
---------------------------------------------------------------------------------
Compilation and execution times (using intel compiler version 18.3):
[1] (we add -mcmodel=medium to the link adviser suggestion, dynamic linking, openmp and explicit linking to mkl )
Execution wall clock time: 18 mins 5 secs
mpiifort -mcmodel=medium -i8 -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl -o para01.exe solve_by_lu_parallelmpi_simple_light2.for -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl
[2] (we add mcmodel and m64; dynamic linking , openmp and explicit linking to mkl )
Execution wall clock time: 18 mins 2 secs
mpiifort -mcmodel=medium -m64 -i8 -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl -o para02.exe solve_by_lu_parallelmpi_simple_light2.for -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl
[3]
(we only add mcmodel; static linking, openMP and linking explicitly with mkl libraries):
Execution wall clock time: 18 mins 33 secs
mpiifort -mcmodel=medium -i8 -I${MKLROOT}/include ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -o para03.exe solve_by_lu_parallelmpi_simple_light2.for ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl
[4] (we only add mcmodel and m64; static linking, openMP and linking explicitly with mkl libraries):
Execution wall clock time: 18 mins 33 secs
mpiifort -mcmodel=medium -m64 -i8 -I${MKLROOT}/include ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -o para04.exe solve_by_lu_parallelmpi_simple_light2.for ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl
[5]
(just mcmodel added ; dynamic linking , sequential (no openmp) :
Execution wall clock time: 56 mins 15 secs
mpiifort -mcmodel=medium -i8 -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -ldl -o para05.exe solve_by_lu_parallelmpi_simple_light2.for -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -ldl
[6]
(we only add mcmodel and m64; dynamic linking , sequential (no openmp):
Execution wall clock time: timing 56 mins 10 secs
mpiifort -mcmodel=medium -m64 -i8 -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -ldl -o para06.exe solve_by_lu_parallelmpi_simple_light2.for -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lpthread -lm -ldl
[7]
(just mcmodel added) static linking sequential (no openmp):
Execution wall clock time: 1 hour 5 mins
mpiifort -mcmodel=medium -i8 -I${MKLROOT}/include ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -ldl -o para07.exe solve_by_lu_parallelmpi_simple_light2.for ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -ldl
[8] (just mcmodel and m64 added; static linking sequential (no openmp):
Execution wall clock time: 1 hour 5 mins
mpiifort -mcmodel=medium -m64 -i8 -I${MKLROOT}/include ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -ldl -o para08.exe solve_by_lu_parallelmpi_simple_light2.for ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm -ldl
[9]
(no additions by me; dynamic link, openmp no mcmodel):
Execution wall clock time: 18 mins 3 secs
mpiifort -i8 -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl -o para09.exe solve_by_lu_parallelmpi_simple_light2.for -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl
[10]
(no additions by me; static link, openmp no mcmodel) :
Execution wall clock time: 18 min 30 secs
mpiifort -i8 -I${MKLROOT}/include ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl -o para10.exe solve_by_lu_parallelmpi_simple_light2.for ${MKLROOT}/lib/intel64/libmkl_scalapack_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page