Hello team, I am using the intel/mpi/64/2019/5.075 mpi lib in my school cluster; this cluster is Cray cs 400 and using craype-network-infiniband for the communication; I currently received the following errors:
" Fatal error in PMPI_Waitall: Other MPI error, error stack:
MPIDI_OFI_handle_cq_error(991)"
This error seems randomly happened during the run time;
Thank you for helping me to trouble shoot.
BEST
链接已复制
3 回复数
Hi,
Thanks for reaching out to us.
Could you please provide a sample reproducer code and the commands you used for reproducing the issue?
Also, please let us know how many nodes you are using for launching the MPI job?
Thanks & Regards,
Hemanth.
