- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was running a MPI program on Skylake nodes. The program is able to finish on only one node (np=2, ph=2) and reports no error. However, if I run it using two Skylake nodes (np=2, ph=1), I would get the following error:
rank = 1, revents = 8, state = 8
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 2988: (it_plfd->revents & POLLERR) == 0
internal ABORT - process 0
Weird thing is, all my colleagues using csh can finish the run without reporting this failed assertion, and who are using bash (including me) always get this assertion issue failed at the same line (2988). What could be the potential causes for this type of error?
- Tags:
- Cluster Computing
- General Support
- Intel® Cluster Ready
- Message Passing Interface (MPI)
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Duplicate thread. Please refer to this thread instead: https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/851861
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page