Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Segfault in scalapack pdgetrf_

fpinria
Beginner
385 Views

Hi,


In order to compare several parallel dgetrf implementations I'm trying to use the OneAPI 2025 Scalapack pdgetrf_ function.
The problem is that for large enough matrices (say N>120000), I get a segfault in the call of pdgetrf_

/beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_avx512.so.2(mkl_blas_avx512_xidamax+0x67) [0x2b8c4bb6f817]
/beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_gnu_thread.so.2(mkl_blas_idamax+0x34b) [0x2b875380725b]
/beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_intel_lp64.so.2(idamax+0x3d) [0x2b8752a7827d]
/beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdamax_+0x3d5) [0x2b87525a92e5]
/beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdgetf3_+0x221) [0x2b875249dff1]
/beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdgetf3_+0x44c) [0x2b875249e21c]
/beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdgetf2_+0x18f) [0x2b8752266b2f]
/beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdgetrf2_+0x6ea) [0x2b875226723a]
/beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdgetrf_+0x97b) [0x2b875263d7fb]
/home/pruvost/git/dplasma/tools/cscalapack/tpdgetrf() [0x4025ce]

It smells like an int overflow somewhere in the scalapack code.
By the way I checked with the reference scalapack v2.2.2 and there is no segfault.
Thus, I tried to link with the ilp64 interface (64bit integers), it fixed the segfault issue, but execution is slower than using the lp64 interface.

My questions :
1) Is this bug already identified by the Intel developers ?
2) Is it normal to get the segfault with the standard interface, knowing that integers parameters given to pdgetrf_ are much smaller than 2^31-1.
If the problem is internal to OneAPI scalapack it should be fixed right (and since it works with reference scalapack) ?
3) Is it normal to get slower pdgetrf_ executions using the ilp64 interface instead of lp64 ?

Thanks.

Some details about building and running:

# build int32
rm common.o pdgetrf.o
mpicc -m64 -I"${MKLROOT}/include" -O0 -g -o common.o -c common.c
mpicc -m64 -I"${MKLROOT}/include" -O0 -g -o pdgetrf.o -c pdgetrf.c
mpif77 -o tpdgetrf_int32 pdgetrf.o common.o -m64 -L${MKLROOT}/lib -lmkl_scalapack_lp64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lmkl_blacs_openmpi_lp64 -lgomp -lpthread -lm -ldl

# build int64
rm common.o pdgetrf.o
mpicc  -DMKL_ILP64 -m64 -I"${MKLROOT}/include" -o common.o -c common.c
mpicc  -DMKL_ILP64 -m64 -I"${MKLROOT}/include" -o pdgetrf.o -c pdgetrf.c
mpif77 -o tpdgetrf_int64 pdgetrf.o common.o -m64 -L${MKLROOT}/lib -lmkl_scalapack_ilp64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_gnu_thread -lmkl_core -lmkl_blacs_openmpi_ilp64 -lgomp -lpthread -lm -ldl

## Executed on 4 nodes (8 MPI processes, 1/socket), each node is composed of 2 processors of 18-core Cascade Lake Intel Xeon Skylake Gold 6240 @ 2.6 GHz.

# run int32
mpiexec --map-by socket ./tpdgetrf_int32 -N 57600 -t 576 -nruns 1 -P 2 -Q 4
[****] TIMEHL(s) 36.26635 : dgetrf PxQ= 2 4 NB= 576 N= 57600 : 3512.907704 gflops
mpiexec --map-by socket ./tpdgetrf_int32 -N 115200 -t 576 -nruns 1 -P 2 -Q 4
[****] TIMEHL(s) 291.38776 : dgetrf PxQ= 2 4 NB= 576 N= 115200 : 3497.776452 gflops
mpiexec --map-by socket ./tpdgetrf_int32 -N 144000 -t 576 -nruns 1 -P 2 -Q 4
==== backtrace (tid: 99392) ====
0 /gnu/store/ifh6wl2mkx3bnx3v1zpy6qyv53lmaanl-ucx-1.17.0/lib/libucs.so.0(ucs_handle_error+0x294) [0x2b87664ce004]
1 /gnu/store/ifh6wl2mkx3bnx3v1zpy6qyv53lmaanl-ucx-1.17.0/lib/libucs.so.0(+0x331bf) [0x2b87664ce1bf]
2 /gnu/store/ifh6wl2mkx3bnx3v1zpy6qyv53lmaanl-ucx-1.17.0/lib/libucs.so.0(+0x33486) [0x2b87664ce486]
3 /beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_avx512.so.2(mkl_blas_avx512_xidamax+0x67) [0x2b8c4bb6f817]
4 /beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_gnu_thread.so.2(+0x19ee5d) [0x2b8753806e5d]
5 /gnu/store/id7x9f869b94q9fnszmiy6dnvslpciyc-gfortran-11.4.0-lib/lib/libgomp.so.1(GOMP_parallel+0x46) [0x2b8758f34c26]
6 /beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_gnu_thread.so.2(mkl_blas_idamax+0x34b) [0x2b875380725b]
7 /beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_intel_lp64.so.2(idamax+0x3d) [0x2b8752a7827d]
8 /beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdamax_+0x3d5) [0x2b87525a92e5]
9 /beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdgetf3_+0x221) [0x2b875249dff1]
10 /beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdgetf3_+0x44c) [0x2b875249e21c]
11 /beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdgetf2_+0x18f) [0x2b8752266b2f]
12 /beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdgetrf2_+0x6ea) [0x2b875226723a]
13 /beegfs/pruvost/intel/oneapi/mkl/2025.0/lib/libmkl_scalapack_lp64.so.2(pdgetrf_+0x97b) [0x2b875263d7fb]
14 /home/pruvost/git/dplasma/tools/cscalapack/tpdgetrf() [0x4025ce]
15 /gnu/store/hw6g2kjayxnqi8rwpnmpraalxi0djkxc-glibc-2.39/lib/libc.so.6(+0x29bf7) [0x2b875957cbf7]
16 /gnu/store/hw6g2kjayxnqi8rwpnmpraalxi0djkxc-glibc-2.39/lib/libc.so.6(__libc_start_main+0x7c) [0x2b875957ccac]
17 /home/pruvost/git/dplasma/tools/cscalapack/tpdgetrf() [0x402231]
=================================

# run int64
mpiexec --map-by socket ./tpdgetrf_int64 -N 57600 -t 576 -nruns 1 -P 2 -Q 4
[****] TIMEHL(s) 53.08035 : dgetrf PxQ= 2 4 NB= 576 N= 57600 : 2400.141236 gflops
mpiexec --map-by socket ./tpdgetrf_int64 -N 115200 -t 576 -nruns 1 -P 2 -Q 4
[****] TIMEHL(s) 298.32326 : dgetrf PxQ= 2 4 NB= 576 N= 115200 : 3416.459206 gflops
mpiexec --map-by socket ./tpdgetrf_int64 -N 144000 -t 576 -nruns 1 -P 2 -Q 4
[****] TIMEHL(s) 486.57299 : dgetrf PxQ= 2 4 NB= 576 N= 144000 : 4091.155198 gflops

0 Kudos
4 Replies
Aleksandra_K
Employee
305 Views

Hi,

I wanted to let you know that I'm working on the issue and I'll come back to you with my progress.


Regards,

Aleksandra


0 Kudos
Aleksandra_K
Employee
256 Views

Hi,

what is in the file flops.h?


0 Kudos
fpinria
Beginner
243 Views

Hi,

Thanks for your answer.

The given code is actually taken from dplasma, and flops.h is https://github.com/ICLDisco/dplasma/blob/master/src/flops.h

0 Kudos
Aleksandra_K
Employee
58 Views

Hi, 

In the pdgetrf.c, starting from line #59 there is  

"A = malloc( sizeof(double)*mloc*nloc );" 

will allocate size of mloc*nloc, which is N*N/np.

As you were using rank = 8, then :

- for N=115200 we have N*N/np = 115200*115200/8 =1,658,880,000 which within the range of int32 (which ranges from –2147483648 to 2,147,483,647)

- for N=144000 we have N*N/np = 144000*144000/8 = 2,592,000,000 which is greater than the highest possible values of int32.

 

As for the reference scalapack, did you use any integer-related flag when building it?


0 Kudos
Reply