- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Whenever I use the mpirun -f option to run MPI programs on the device, I get a segmentation fault.
[user@node ~]$ echo mic0 > mic0.hosts [user@node ~]$ mpirun -perhost 1 -n 2 -f mic0.hosts ./hello Segmentation fault
However, I am able to run it fine with the -host option or even with the -f option as long as there are no mic devices in the file.
[user@node ~]$ echo different_node > mic0.hosts [user@node ~]$ mpirun -n 2 -f mic0.hosts ./hello CPU: Hello from different_node 1 of 2 CPU: Hello from different_node 0 of 2 [user@node ~]$ mpirun -perhost 1 -n 2 -host mic0 ./hello MIC: Hello from node-mic0 1 of 2 MIC: Hello from node-mic0 0 of 2
I haven't been very successful in debugging this problem. Does anyone have any suggestions?
Thanks,
Dan
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dan,
If the file mic0.host contains "mic0" only, then your two commands are equivalent
% mpirun -perhost 1 -n 2 -f mic0.hosts ./hello
% mpirun -n 2 -host mic0 ./hello
Assuming that "hello" is the MIC binary, and transferred to /root in mic0
Could you verify the content of mic0.hosts contains only mic0?
% cat mic0.hosts
Also, what MPI version and compiler version are you using? Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thank you for your response.
I am using Intel MPI v4.1.3.045.
I construct mic0.hosts using "echo mic0 > mic0.hosts". It does contain only mic0 as confirmed by cat:
$ cat mic0.hosts mic0 $
If I construct the hosts file the same way but with non-Xeon Phi hosts, then the program executes correctly. I would have expected the -hosts command and the -f command to be equivalent to, however I observe a segmentation fault with one and proper execution with the other.
Dan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I notice that in your original example, you specify -perhost on the command line when you are using the coprocessor but not when you are using the non-coprocessor hosts. Is this just a typo? If not, could you try your example using the -perhost in both cases?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have not been able to duplicate this behavior. If this is still a problem, please let us know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for checking in Frances. I have checked again today and there is no problem with using the -f option, although I haven't updated any of the software involved. It is working both with the -perhost option and without it. So, it seems I am no longer able to reproduce the problem myself.
I will post here again if the problem resurfaces. Thank you for your help.
Dan

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page