- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I get the following error on my cluster when I submit jobs
mpiexec_node050: cannot connect to local mpd (/tmp/mpd2.console_sudharshan); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
While I see that this error has been discussed in the threads before, what I see is that the error pops up quite unpredictably. While my job runs fine with a particular number of processors, and when I submit it again with a different number of processors, this error comes up. It is not clear under what conditions I get this issue. I have been getting this error for the same number of processors with which I have been able to run jobs fine, with the same scripts and with the same code. Any siggestion/help shall be sincerely appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Before you execute mpiexec command, does mpdtrace show list of all the nodes on which you want to run your job?
Are you using -machinefile option in your mpiexec command?- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi sudh,
Could you provide command line and library version?
Regards!
Dmitry
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page