Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

error Pmpi_init error on linux

Mitul1
Beginner
599 Views

While running job on HPC Rocky 8 with OpenHPC 2.x. Getting an error with Intel MPI

Error

Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(176)........: 
MPID_Init(1538)..............: 
MPIDI_OFI_mpi_init_hook(1511): 
open_fabric(2568)............: OFI fi_getinfo() failed (ofi_init.c:2568:open_fabric:No data available)

 
 
 
Slurm JOB
#!/bin/bash
#SBATCH -p normal                      
#SBATCH --no-requeue
#SBATCH -J JOB_aimD             
#SBATCH -o JOB_aiMD.%J                 
 
#SBATCH -n 40                         
 
ulimit -s unlimited
module purge
module list
module load intel mpi
 
 

Am I missing a library or package
0 Kudos
1 Reply
taehunkim
Employee
433 Views

Hi,

The error you're encountering suggests that the Intel MPI library is having trouble initializing the OFI (Open Fabrics Interfaces) provider, which is often used for high-performance networking in HPC environments. The specific error indicates that the OFI provider could not find any suitable network interfaces or fabric services.

Here are some steps to troubleshoot and resolve this issue:

  • Check OFI Providers: Ensure that the necessary OFI providers are installed on your system. You can check the available providers by running:

         $ fi_info

           This command should list the available fabric interfaces. If it returns "No data available," it means no suitable providers are found.

 

  • Install Required Packages: Make sure that the required OFI libraries and providers are installed. On Rocky Linux 8 with OpenHPC, you might need to install packages like libfabric and its providers. You can install them using:

         $ sudo yum install libfabric libfabric-devel

 

  • Configure Intel MPI to Use a Specific Provider: Sometimes, specifying a particular provider can help. You can set the FI_PROVIDER environment variable to a specific provider that is available on your system.  For example:

            $ export FI_PROVIDER=sockets

         You can add this line to your Slurm job script before the mpirun or srun command.

 

  • Check Network Configuration: Ensure that the network interfaces on your nodes are properly configured and accessible. The OFI provider might be looking for specific high-performance network interfaces (like InfiniBand or Omni-Path) that are not configured or available.

 

  • Intel MPI Configuration: Intel MPI can be configured to use different communication fabrics. You can try setting the I_MPI_FABRICS environment variable to use a different fabric. For example:

              $export I_MPI_FABRICS=shm:ofi  or   export I_MPI_FABRICS=shm:tcp

          Add this line to your Slurm job script before the mpirun or srun command.

 

You can get hint at here ( https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-13/ofi-capable-network-fabrics-control.html )

 

Thanks.

0 Kudos
Reply