- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are trying to run parallel on a single node using Intel MPI - 2018.0.124 and getting the following error.
..\hydra\pm\pmiserv\pmiserv_cb.c (834): connection to proxy 0 at host XXX-NNNN failed
..\hydra\tools\demux\demux_select.c (103): callback returned error status
..\hydra\pm\pmiserv\pmiserv_pmci.c (507): error waiting for event
..\hydra\ui\mpich\mpiexec.c (1148): process manager error waiting for completion
We have checked hydra-service status and found that to be working.
mpiexec also seems to be working ok.
mpiexec -n 2 hostname - returns the localhost name
mpiexec -validate - returns success
We have also checked that the hydra service is running the version we want and it is the only version in the machine.
Is there anything we can do to check why the runs fail?
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Could you please add -localonly and -verbose options to your command line and send me all output? It may help to understand what happened.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Sorry for the delayed reply. Please find attached the output. I must add that this happens only when we run more than 48 processors. The output is from a machine which does not have 48 processors. But we have run on a different machine which has 96 cores and it fails with the same errors there as well.
Thanks,
Araham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you also provide your command line? Do I understand correctly that you specify each process through the colon like `mpiexec -n 1 <exec> : -n 1 <exec> : ...`? If yes, could you also try to combine these processes like `mpiexec -n 49 <exec>` if possible.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page