Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2293 Discussões

Core count limitations for the shared memory transport

drmiket7777
Novo colaborador I
591 Visualizações

I was wondering if the shared memory transport in the latest IntelMPI has any core count limitation. I was trying to run on an Azure HBv5 (MI300C0) node with 368 cores but it crashes while doing the

MPI_Init

.

 

Thanks

Michael  

0 Kudos
3 Respostas
Sergey_K_Intel3
Funcionário
577 Visualizações

Please check your number of open files ulimit settings.

Run "ulimit -a" to report all current settings or "ulimit -n" just to report the number of open files limit.

Run "ulimit -Sn <number>" to set a new limit.

Intel MPI uses around 3 file descriptors for each rank.

drmiket7777
Novo colaborador I
574 Visualizações

I see, thanks.

There is  no other build-in resource limitation in the code per core besides the number of file descriptors, right?

Thanks

drmiket7777
Novo colaborador I
538 Visualizações

Our ulimit (soft limits are the same) is 


$ ulimit -Ha
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 442544
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 32768
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Can you think of any other resource that could become sparse when we use IntelMPI on very high core count nodes? We are using nodes with 368 cores each.

Responder