- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
This question has been asked before, but I don't think the poster got an answer. I am trying to compile a user's code. It runs for a while, but then stops with the error message shown in the subject. A bit more detail:
[71:wardlaw187] unexpected disconnect completion event from [0:wardlaw182]
Assertion failed in file ../../dapl_module_util.c at line 2682: 0
internal ABORT - process 71
This is impi 4.0.1, with ifort 12.0.0 20101006, on RHEL/CentOS (the login/compile node is RHEL 5.3, the compute nodes use a slightly modified (by Bull, the vendor) CentOS 5.3). The hardware is a Westmere cluster with Infiniband interconnect.
Any ideas?
Thanks in advance,
Herbert
This question has been asked before, but I don't think the poster got an answer. I am trying to compile a user's code. It runs for a while, but then stops with the error message shown in the subject. A bit more detail:
[71:wardlaw187] unexpected disconnect completion event from [0:wardlaw182]
Assertion failed in file ../../dapl_module_util.c at line 2682: 0
internal ABORT - process 71
This is impi 4.0.1, with ifort 12.0.0 20101006, on RHEL/CentOS (the login/compile node is RHEL 5.3, the compute nodes use a slightly modified (by Bull, the vendor) CentOS 5.3). The hardware is a Westmere cluster with Infiniband interconnect.
Any ideas?
Thanks in advance,
Herbert
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Herbert,
It looks like this is rather application issue.
Probably one process dies and and other processes cannot get completion event. Could you please set I_MPI_DEBUG_COREDUMP=1 and start you application (much better if it was compiled with '-g'). It should generate a core file. Take a look at the backtrace and let me know if you think that this is Intel MPI issue.
Regards!
Dmitry
It looks like this is rather application issue.
Probably one process dies and and other processes cannot get completion event. Could you please set I_MPI_DEBUG_COREDUMP=1 and start you application (much better if it was compiled with '-g'). It should generate a core file. Take a look at the backtrace and let me know if you think that this is Intel MPI issue.
Regards!
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting Dmitry Kuzmin (Intel)
Hi Herbert,
It looks like this is rather application issue.
Probably one process dies and and other processes cannot get completion event. Could you please set I_MPI_DEBUG_COREDUMP=1 and start you application (much better if it was compiled with '-g'). It should generate a core file. Take a look at the backtrace and let me know if you think that this is Intel MPI issue.
Regards!
Dmitry
It looks like this is rather application issue.
Probably one process dies and and other processes cannot get completion event. Could you please set I_MPI_DEBUG_COREDUMP=1 and start you application (much better if it was compiled with '-g'). It should generate a core file. Take a look at the backtrace and let me know if you think that this is Intel MPI issue.
Regards!
Dmitry
I came across the same problem. I tried to set the I_MPI_DEBUG_COREDUMP=1 but no core file dumped so that I could not know the detail of the abortion. The application I run was mpiBLAST, a very famous HPC application of biology.
Can you show more hints? Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please submit a ticket on the premier.intel.com?
I'm not sure that we should investigate the issue on this forum.
Regards!
Dmitry
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page