Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2058 Discussions

MPI hangs with multiple processes on same node - Intel AI DevCloud

Student Ambassador


I am using Intel AI DevCloud to run a Deep Reinforcement Learning training, using mpi4py to use several agents to collect data at the same time.

In my framework, I run N jobs (in different nodes) with agents and another job with the optimization algorithm in python.

When I run a single agent in each job, the application works correctly. However, when I try to run more than one agent in the same job (same node), the application hangs.

I do not think the problem is the application itself because it works when there is a single agent per node. Additionally, the same application used to work with multiple agents per node last year, when Intel DevCloud was CentOS.

The application is in this code:


Probably I am not presenting enough data about the problem, so if you need any information regarding MPI stuff I can send in this post.


0 Kudos
0 Replies