Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1939 Discussions

SLURM and oneAPI cluster installation problems PMI library

j0e
New Contributor I
839 Views

I just update my cluster from cluster studio to the latest release of the oneAPI version (2021.4). The installation went fine, and ifort and mpiexec work as expected. However, when I try to use SLURM (which worked fine with cluster studio), I get errors such as:

Abort(1091087) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(138): 
MPID_Init(996).......: 
MPIR_pmi_init(168)...: PMI2_Job_GetId returned 14
.
.
.
srun: error: node3: tasks 30-49: Exited with exit code 1
srun: error: node5: tasks 70-89: Exited with exit code 1
srun: error: node4: tasks 50-69: Exited with exit code 1
srun: error: node2: tasks 10-29: Exited with exit code 1

The environment variable is set,

I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so

and I've also tried libpmi2.so, but same issue. I'm runniing SLURM 17.02.7

Any ideas?

0 Kudos
4 Replies
ShivaniK_Intel
Moderator
759 Views

Hi,


Thanks for reaching out to us.


Could you please provide us the libfabric provider you have been using?


To investigate more on your issue, could you please provide us the command line you have been using?


Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
729 Views

Hi,

 

1. Could you please provide the complete debug log with I_MPI_DEBUG=100?

 

2. Could you also provide the output of the below command?

lscpu

 

>>>However when I try to use SLURM (which worked fine with cluster studio)

 

  As you have said that using slurm worked fine with cluster studio, could you please confirm the mpi version of cluster studio you have used?

 

Thanks & Regards

Shivani

 

ShivaniK_Intel
Moderator
686 Views

Hi,


As we didn't hear back from you, could you please provide the details that have been asked in my previous post?


Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
621 Views

Hi,


I have not heard back from you. This thread will no longer be monitored by Intel.

If you need further assistance please raise a new question.


Thanks & Regards

Shivani


Reply