- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
I have a somewhat strange runtime problem when calling the SCALAPACK PZGETF2 routine.
I have constructed a minimal code that reproduces the problem, which is attached below. The code compiles, and runs successfully for a single process, but fails for two processes
at runtime during the call to PZGETF2, but without returning (thus there is no INFO number etc.).
The error that is returned begins:
Assertion failed in file ../../src/util/intel/shm_heap/impi_shm_heap.c at line 877: offset < heap.shm_size
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x7f88105341d4]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f880fcbc031]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x449f93) [0x7f880fffaf93]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x28563d) [0x7f880fe3663d]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x153487) [0x7f880fd04487]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x199bda) [0x7f880fd4abda]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x180069) [0x7f880fd31069]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x16bd8e) [0x7f880fd1cd8e]
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/inte
Abort(1) on node 0: Internal error
I am running the code on Linux Mint and compile the code by:
mpiifort -o pz.exe pz_factorize.f90 -mkl=parallel -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -ldl
Selecting instead the values NPROW=1 NPCOL=2, MB=5 the code will instead fail with the above error
if run using:
mpiexec.hydra -n 2 ./pz.exe
Can any of you knowledgeable fortran gurus see where I am going wrong ?!
I am very grateful for any assistance,
Thanks, Dan.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tested the code with oneMKL 2021.2. Encountered error. Escalated!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The error was due to the shared memory transfer.
oneMKL and Intel MPI do not support Linux Mint.
oneMKL system requirements: https://software.intel.com/content/www/us/en/develop/articles/oneapi-math-kernel-library-system-requirements.html
For C/C++ and Fortran
Linux*
- Amazon* Linux 2
- CentOS* (latest version)
- Clear Linux*
- Debian* (latest version)
- Wind River* Linux (latest version)
- Yocto 2.7
- Fedora* 31
- openSUSE* 15
- Redhat Enterprise Linux (RHEL)* 7, 8
- SUSE Linux Enterprise Server* (SLES) 12, 15
- Ubuntu* 18.04 LTS, 20.04 LTS
Intel MPI system requirements: https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-library-release-notes-linux.html
Software Requirements
(installation issues may occur with operating systems that are not released at the date of the current Intel MPI Library release)
- Operating systems:
- Red Hat* Enterprise Linux* 7, 8
- Fedora* 31
- CentOS* 7, 8
- SUSE* Linux Enterprise Server* 12, 15
- Ubuntu* LTS 16.04, 18.04, 20.04
- Debian* 9, 10
- Amazon Linux 2
There will be no more discussion about this issue.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page