Community
cancel
Showing results for 
Search instead for 
Did you mean: 
j0e
New Contributor I
236 Views

SLURM and oneAPI cluster installation problems PMI library

I just update my cluster from cluster studio to the latest release of the oneAPI version (2021.4). The installation went fine, and ifort and mpiexec work as expected. However, when I try to use SLURM (which worked fine with cluster studio), I get errors such as:

Abort(1091087) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(138): 
MPID_Init(996).......: 
MPIR_pmi_init(168)...: PMI2_Job_GetId returned 14
.
.
.
srun: error: node3: tasks 30-49: Exited with exit code 1
srun: error: node5: tasks 70-89: Exited with exit code 1
srun: error: node4: tasks 50-69: Exited with exit code 1
srun: error: node2: tasks 10-29: Exited with exit code 1

The environment variable is set,

I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so

and I've also tried libpmi2.so, but same issue. I'm runniing SLURM 17.02.7

Any ideas?

0 Kudos
4 Replies
ShivaniK_Intel
Moderator
155 Views

Hi,


Thanks for reaching out to us.


Could you please provide us the libfabric provider you have been using?


To investigate more on your issue, could you please provide us the command line you have been using?


Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
125 Views

Hi,

 

1. Could you please provide the complete debug log with I_MPI_DEBUG=100?

 

2. Could you also provide the output of the below command?

lscpu

 

>>>However when I try to use SLURM (which worked fine with cluster studio)

 

  As you have said that using slurm worked fine with cluster studio, could you please confirm the mpi version of cluster studio you have used?

 

Thanks & Regards

Shivani

 

ShivaniK_Intel
Moderator
82 Views

Hi,


As we didn't hear back from you, could you please provide the details that have been asked in my previous post?


Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
17 Views

Hi,


I have not heard back from you. This thread will no longer be monitored by Intel.

If you need further assistance please raise a new question.


Thanks & Regards

Shivani


Reply