- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hello,
I have compiled parallel version of siesta-3.2 code with Intel Parallel Studio XE 2016 Cluster Edition. There were not any problems, but every siesta run ended with the next error:
Fatal error in PMPI_Cart_create: Other MPI error, error stack:
PMPI_Cart_create(332).........: MPI_Cart_create(comm=0x84000007, ndims=2, dims=0x7fffe76b6288, periods=0x7fffe76b62a0, reorder=0, comm_cart=0x7fffe76b61e0) failed
MPIR_Cart_create_impl(189)....:
MPIR_Cart_create(115).........:
MPIR_Comm_copy(1070)..........:
MPIR_Get_contextid(543).......:
MPIR_Get_contextid_sparse(608): Too many communicators
displayed for every node (if I am using X nodes, I will see X same messages in output file).
I supposed I have used too many nodes, but decreasing the number of nodes have not solved this problem. Such kind of error appeared when I decided to update release version up to "Update 3" using online installer. People from other forums have given me advise to reinstall MPI env. So I did this, but the problem still occurs. Are there any ideas? My arch.make file is attached.
Thanks a lot for any help!
Link copiado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hi
Just to back this up, I get the same with a run of Quantum Espresso. The compiled binary works OK with Intel MPI 5.1.1 but with 5.1.3:
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
PMPI_Cart_sub(242)...................: MPI_Cart_sub(comm=0xc400fcf4, remain_dims=0x7fffffffa7e8, comm_new=0x7fffffffa740) failed
PMPI_Cart_sub(178)...................:
MPIR_Comm_split_impl(270)............:
MPIR_Get_contextid_sparse_group(1330): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Cart_sub: Other MPI error, error stack:
Image PC Routine Line Source
pw.x 0000000000C94025 Unknown Unknown Unknown
libpthread.so.0 0000003568E0F790 Unknown Unknown Unknown
libmpi.so.12 00002AAAAF8E7B50 Unknown Unknown Unknown
This seems to be a bug.....?
~~
Ade
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hello,
is this fixed in the 2017 release?
Don't see it mentioned at https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2017-bug-fixes-list
Cheers
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
This does not seem to be fixed even in 2017 beta3:
studio2017_beta3/compilers_and_libraries_2017.0.064
and doesn't work in anything after 11.2 . I have lost many hours from the bug--please fix it in the next release!
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
yes, this issue has been fixed in MKL v.2017 ( released Sep 6th,2016). Please check and give us the update in the case if issue is still exists. thanks, Gennady
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I just downloaded the latest version, same issue still occurs.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Yes, I am running 2017 -4 and I get:
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc4027cd4, color=0, key=1, new_comm=0x7ffe9a6df1f0) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (4/16384 free on this process; ignore_id=0)
This is with ELPA latest version elpa-2017.05.001.rc2 and quantum espresso latest version 6.1 and latest version Intel MKL.
Ron Cohen
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I am also experiencing the exact same problem with 2017 update 4.
All jobs continue to crash with this Too many communicators error.
None useable!
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
//This does not seem to be fixed even in 2017 beta3
// I am also experiencing the exact same problem with 2017 update 4.
// Yes, I am running 2017 -4 and I get:
Can you please clarify about used MKL version? MKL 2017.4 is not released yet, the very latest available MKL is 2017.3. For example, can you please set MKL_VERBOSE=1 before running Siesta or QE, and report MKL version reported?
We're pretty sure that the issue with too many communicators had been fixed more than year ago, and all MKL 2017 official releases (not Beta definitely) should have the fix. Just recently I was making some Siesta runs and didn't see any issues. In case you see this problem with MKL 2017 and later, please let us know and provide a reproducer or instructions how to reproduce the problem.
Regards,
Konstantin
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I can still see this using IntelMPI 2018.1.163 whn running QuantumEspresso 6.1 built with
icc/ifort/imkl/impi 2018.1.163
MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.60GHz lp64 intel_thread NMICDev:0
...
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc4000012, color=0, key=0, new_comm=0x7ffe71e4075c) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (16361/16384 free on this process; ignore_id=0)
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
A bit more info on this is that when compiling the same QuantumEspresso 6.1 with
GCC 6.4, Scalapack from netlib, FFTW, OpenBLAS and Intel MPI 2018.1.163
I get the exact same error, so it is not MKL related but rather Intel MPI itself that has the problem.
Or at least that's the most likely culprit here.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Actually the problem with p?gemm has been fixed in MKL 2018 u3. Here is the link to MKL Bug fix list: https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2018-bug-fixes-list. MKLD-3445 Fixed run-time failure of P?GEMM routine for specific problem sizes.
- Subscrever fonte RSS
- Marcar tópico como novo
- Marcar tópico como lido
- Flutuar este Tópico para o utilizador atual
- Marcador
- Subscrever
- Página amigável para impressora