Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
26 Views

mpirun init fatal error for hello world example

Hello,

I'm running into a fatal error when trying to run the simple Hello world test with mpirun -np 2 and above. It works fine when using only one process. See the output below. Do you have an idea what the problem is?

lion@sol48: ~/FHI-aims/09_06_20/testcases/H2O-relaxation $ mpiifort -v
mpiifort for the Intel(R) MPI Library 2019 for Linux*
Copyright 2003-2018, Intel Corporation.
ifort version 19.0.0.117

lion@sol48: ~/FHI-aims/09_06_20/testcases/H2O-relaxation $ cat test.f90
!
! Copyright 2003-2018 Intel Corporation.
! 
! This software and the related documents are Intel copyrighted materials, and
! your use of them is governed by the express license under which they were
! provided to you (License). Unless the License provides otherwise, you may
! not use, modify, copy, publish, distribute, disclose or transmit this
! software or the related documents without Intel's prior written permission.
! 
! This software and the related documents are provided as is, with no express
! or implied warranties, other than those that are expressly stated in the
! License.
!
        program main
        use mpi
        implicit none

        integer i, size, rank, namelen, ierr
        character (len=MPI_MAX_PROCESSOR_NAME) :: name
        integer stat(MPI_STATUS_SIZE)

        call MPI_INIT (ierr)

        call MPI_COMM_SIZE (MPI_COMM_WORLD, size, ierr)
        call MPI_COMM_RANK (MPI_COMM_WORLD, rank, ierr)
        call MPI_GET_PROCESSOR_NAME (name, namelen, ierr)

        if (rank.eq.0) then

            print *, 'Hello world: rank ', rank, ' of ', size, ' running on ', name

            do i = 1, size - 1
                call MPI_RECV (rank, 1, MPI_INTEGER, i, 1, MPI_COMM_WORLD, stat, ierr)
                call MPI_RECV (size, 1, MPI_INTEGER, i, 1, MPI_COMM_WORLD, stat, ierr)
                call MPI_RECV (namelen, 1, MPI_INTEGER, i, 1, MPI_COMM_WORLD, stat, ierr)
                name = ''
                call MPI_RECV (name, namelen, MPI_CHARACTER, i, 1, MPI_COMM_WORLD, stat, ierr)
                print *, 'Hello world: rank ', rank, ' of ', size, ' running on ', name
            enddo

        else

            call MPI_SEND (rank, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr)
            call MPI_SEND (size, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr)
            call MPI_SEND (namelen, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, ierr)
            call MPI_SEND (name, namelen, MPI_CHARACTER, 0, 1, MPI_COMM_WORLD, ierr)

        endif

        call MPI_FINALIZE (ierr)

        end
lion@sol48: ~/FHI-aims/09_06_20/testcases/H2O-relaxation $ mpiifort test.f90
lion@sol48: ~/FHI-aims/09_06_20/testcases/H2O-relaxation $ mpirun -np 1 ./a.out
 Hello world: rank            0  of            1  running on 
 sol48                                                                          
                                                 
lion@sol48: ~/FHI-aims/09_06_20/testcases/H2O-relaxation $ mpirun -np 2 ./a.out
Abort(1093903) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(607)..........: 
MPID_Init(731).................: 
MPIR_NODEMAP_build_nodemap(710): PMI_KVS_Get returned 4
In: PMI_Abort(1093903, Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(607)..........: 
MPID_Init(731).................: 
MPIR_NODEMAP_build_nodemap(710): PMI_KVS_Get returned 4)
Abort(1093903) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(607)..........: 
MPID_Init(731).................: 
MPIR_NODEMAP_build_nodemap(710): PMI_KVS_Get returned 4
In: PMI_Abort(1093903, Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(607)..........: 
MPID_Init(731).................: 
MPIR_NODEMAP_build_nodemap(710): PMI_KVS_Get returned 4)

 

0 Kudos
5 Replies
Highlighted
Moderator
26 Views

Hi Konstantin,

Hi Konstantin,

Thanks for reaching out to us!

We tried to reproduce the error which you are facing but we are unable to reproduce it.

We got the following when we ran the code.

 

u30009@s001-n179:~/goutham/forums/856899_mpirun_initfatal/test$ mpirun -np 1 ./a.out
 Hello world: rank            0  of            1  running on
 s001-n179

u30009@s001-n179:~/goutham/forums/856899_mpirun_initfatal/test$ mpirun -np 2 ./a.out
 Hello world: rank            0  of            2  running on
 s001-n179

 Hello world: rank            1  of            2  running on
 s001-n179

Please refer to the below thread which discusses issue similar to your issue:

https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/799716

Please let us know if the above link helps you. 

 

 

Thanks & Regards

Goutham

0 Kudos
Highlighted
26 Views

PMI_KVS_Get failure 4:

PMI_KVS_Get failure 4:

See:

https://github.com/hpcng/singularity/issues/5118

locate: mmiesch commented on Mar 16

Jim Dempsey

0 Kudos
Highlighted
Moderator
26 Views

Hi Konstantin,

Hi Konstantin,

Could you please let us know the status of the issue you are facing.

If the issue still persists, please provide the output of the below check.

  1. Please verify if you are able to access the nodes which are present in the node file. Depending on your environment(Job Scheduler) nodefile may vary.

Example: 

In our environment: mpirun uses $PBS_NODEFILE as a machine file. 

You may check your appropriate nodefile depending on your environment in below link:

https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/run...

 

 2. Run the Intel® Cluster Checker:

•    source clckvars.sh 
•    clck -f <nodefile>

Please check the below link for more details.
https://software.intel.com/content/www/us/en/develop/documentation/cluster-checker-user-guide/top/ge...

 

 

Thanks & Regards
Goutham


 

0 Kudos
Highlighted
Moderator
26 Views

Hi Konstantin,

Hi Konstantin,

Could you please let us know the status of the issue you are facing?

If your issue still persists, please do let us know. So that we will be able to help you resolve your issue. 

if your issue is resolved, let us know whether we can close this thread.

 

 

Thanks & Regards

Goutham

0 Kudos
Highlighted
Moderator
11 Views

Re:mpirun init fatal error for hello world example

Hi Konstantin,

Could you please let us know the status of the issue you are facing?

If your issue still persists, please do let us know. So that we will be able to help you resolve your issue. 

if your issue is resolved, let us know whether we can close this thread.


Regards

Goutham


0 Kudos