Community
cancel
Showing results for 
Search instead for 
Did you mean: 
daren__wall
Beginner
359 Views

Puzzling (but maybe elementary!) problem calling SCALAPACK PZGETF2 routine

Dear All,

            I have a somewhat strange runtime problem when calling the SCALAPACK PZGETF2 routine.

I have constructed a minimal code that reproduces the problem, which is attached below. The code compiles, and runs successfully for a single process, but fails for two processes

at runtime during the call to PZGETF2, but without returning (thus there is no INFO number etc.).

The error that is returned begins:

Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 877: offset < heap.shm_size
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x7f88105341d4]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f880fcbc031]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x449f93) [0x7f880fffaf93]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x28563d) [0x7f880fe3663d]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x153487) [0x7f880fd04487]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x199bda) [0x7f880fd4abda]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x180069) [0x7f880fd31069]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x16bd8e) [0x7f880fd1cd8e]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/inte
Abort(1) on node 0: Internal error

I am running the code on Linux Mint and compile the code by:

mpiifort -o pz.exe pz_factorize.f90 -mkl=parallel -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -ldl

 
Selecting the values NPROW = NPCOL=1, MB = 10 the code will run using a single process , with execution by:
 
mpiexec.hydra   -n  1 ./pz.exe
 

Selecting instead the values NPROW=1 NPCOL=2, MB=5 the code will instead fail with the above error

if run using:

 mpiexec.hydra   -n  2 ./pz.exe

 

Can any of you knowledgeable fortran gurus see where I am going wrong ?!

 I am very grateful for any assistance,

                                Thanks, Dan.

0 Kudos
2 Replies
Khang_N_Intel
Employee
139 Views

Tested the code with oneMKL 2021.2. Encountered error. Escalated!


Khang_N_Intel
Employee
117 Views

The error was due to the shared memory transfer.

oneMKL and Intel MPI do not support Linux Mint.


oneMKL system requirements: https://software.intel.com/content/www/us/en/develop/articles/oneapi-math-kernel-library-system-requ...

For C/C++ and Fortran

Linux*

  • Amazon* Linux 2
  • CentOS* (latest version)
  • Clear Linux*
  • Debian* (latest version)
  • Wind River* Linux (latest version)
  • Yocto 2.7
  • Fedora* 31
  • openSUSE* 15
  • Redhat Enterprise Linux (RHEL)* 7, 8
  • SUSE Linux Enterprise Server* (SLES) 12, 15
  • Ubuntu* 18.04 LTS, 20.04 LTS

 

Intel MPI system requirements: https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-library-release-notes-linux....

Software Requirements

(installation issues may occur with operating systems that are not released at the date of the current Intel MPI Library release)

  • Operating systems:
    • Red Hat* Enterprise Linux* 7, 8
    • Fedora* 31
    • CentOS* 7, 8
    • SUSE* Linux Enterprise Server* 12, 15
    • Ubuntu* LTS 16.04, 18.04, 20.04
    • Debian* 9, 10
    • Amazon Linux 2


There will be no more discussion about this issue.


Reply