- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am running a MPI application (involving 5 ranks) which runs smoothly when all ranks are on Xeon processor but when i put two ranks on MIC0 and MIC1 there is following issue and the program just hangs and gives me segmentation fault.
setup:-
using (blocking MPI send and non blocking MPI recv)
rank0, rank1 on MIC0,MIC1
rank2,rank3,rank4 on xeon
issue:-
rank1-->sends 100 packets and reaches finalize()
rank2-->only receives 60 packets and then hangs
some things i tried:-
I added a sleep(1) before rank1 sends packets and this solved the issue as rank2 could get all the packets
but for large number of packets (>100) adding sleep doesnt solve the issue and the system hangs
any suggestions
thank you
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vikrant,
I assume you already configured peer-to-peer before running the program:
# sudo /sbin/sysctl -w net.ipv4.ip_forward=1
Also, you may want to try to run rank 0 on host and rank 1, 2 on mic0, mic1 respectively to see if the problem still occurs. What version of compiler and Intel MPI libraries you have?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi loc-nguyen
the version i am using is 4.1.1.036 for mpiicc
I did try interchanging the ranks but face the same issue,
No, i had not done the command "sudo /sbin/sysctl -w net.ipv4.ip_forward=1",but i could do ssh between the two cards and run simple programs with the same hybrid structure , so thought i had that part covered,
but even after i implemented the command on host, I get the same error
thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vikrant,
Is it possible that you post the source code so I can take a close look at the issue? Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi loc-nguyen
sorry cannot post the code here, I am currently in Intel Santa Clara and if possible can I meet you if you are in santa clara
thanks
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page