- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have compiled parallel version of siesta-3.2 code with Intel Parallel Studio XE 2016 Cluster Edition. There were not any problems, but every siesta run ended with the next error:
Fatal error in PMPI_Cart_create: Other MPI error, error stack:
PMPI_Cart_create(332).........: MPI_Cart_create(comm=0x84000007, ndims=2, dims=0x7fffe76b6288, periods=0x7fffe76b62a0, reorder=0, comm_cart=0x7fffe76b61e0) failed
MPIR_Cart_create_impl(189)....:
MPIR_Cart_create(115).........:
MPIR_Comm_copy(1070)..........:
MPIR_Get_contextid(543).......:
MPIR_Get_contextid_sparse(608): Too many communicators
displayed for every node (if I am using X nodes, I will see X same messages in output file).
I supposed I have used too many nodes, but decreasing the number of nodes have not solved this problem. Such kind of error appeared when I decided to update release version up to "Update 3" using online installer. People from other forums have given me advise to reinstall MPI env. So I did this, but the problem still occurs. Are there any ideas? My arch.make file is attached.
Thanks a lot for any help!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Just to back this up, I get the same with a run of Quantum Espresso. The compiled binary works OK with Intel MPI 5.1.1 but with 5.1.3:
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf4, remain_dims=0x7fffffffa7e8, comm_new=0x7fffffffa740) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
Image PC Routine Line Source
pw.x 0000000000C94025 Unknown Unknown Unknown
libpthread.so.0 0000003568E0F790 Unknown Unknown Unknown
libmpi.so.12 00002AAAAF8E7B50 Unknown Unknown Unknown
This seems to be a bug.....?
~~
Ade
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
is this fixed in the 2017 release?
Don't see it mentioned at https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2017-bug-fixes-list
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This does not seem to be fixed even in 2017 beta3:
studio2017_beta3/compilers_and_libraries_2017.0.064
and doesn't work in anything after 11.2 . I have lost many hours from the bug--please fix it in the next release!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, this issue has been fixed in MKL v.2017 ( released Sep 6th,2016). Please check and give us the update in the case if issue is still exists. thanks, Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just downloaded the latest version, same issue still occurs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I am running 2017 -4 and I get:
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc4027cd4, color=0, key=1, new_comm=0x7ffe9a6df1f0) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (4/16384 free on this process; ignore_id=0)
This is with ELPA latest version elpa-2017.05.001.rc2 and quantum espresso latest version 6.1 and latest version Intel MKL.
Ron Cohen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am also experiencing the exact same problem with 2017 update 4.
All jobs continue to crash with this Too many communicators error.
None useable!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
//This does not seem to be fixed even in 2017 beta3
// I am also experiencing the exact same problem with 2017 update 4.
// Yes, I am running 2017 -4 and I get:
Can you please clarify about used MKL version? MKL 2017.4 is not released yet, the very latest available MKL is 2017.3. For example, can you please set MKL_VERBOSE=1 before running Siesta or QE, and report MKL version reported?
We're pretty sure that the issue with too many communicators had been fixed more than year ago, and all MKL 2017 official releases (not Beta definitely) should have the fix. Just recently I was making some Siesta runs and didn't see any issues. In case you see this problem with MKL 2017 and later, please let us know and provide a reproducer or instructions how to reproduce the problem.
Regards,
Konstantin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can still see this using IntelMPI 2018.1.163 whn running QuantumEspresso 6.1 built with
icc/ifort/imkl/impi 2018.1.163
MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.60GHz lp64 intel_thread NMICDev:0
...
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc4000012, color=0, key=0, new_comm=0x7ffe71e4075c) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (16361/16384 free on this process; ignore_id=0)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A bit more info on this is that when compiling the same QuantumEspresso 6.1 with
GCC 6.4, Scalapack from netlib, FFTW, OpenBLAS and Intel MPI 2018.1.163
I get the exact same error, so it is not MKL related but rather Intel MPI itself that has the problem.
Or at least that's the most likely culprit here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually the problem with p?gemm has been fixed in MKL 2018 u3. Here is the link to MKL Bug fix list: https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2018-bug-fixes-list. MKLD-3445 Fixed run-time failure of P?GEMM routine for specific problem sizes.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page