Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

How to catch MPI exception

Jimmy821
Beginner
2,649 Views
Hi,

Is there support for MPI::ERRORS_THROW_EXCEPTIONS?

I notice thatany exception is not caught when there is network loss.

Thanks.
0 Kudos
5 Replies
Dmitry_K_Intel2
Employee
2,649 Views
Hi Jimmy,

Please take a look at the example: here
If you do everything correctly but cannot catch an exception that probably means that MPI functon doesn't return error code.

Regards!
Dmitry
0 Kudos
Andrey_D_Intel
Employee
2,649 Views
Hi,

Could you please clarify what MPI implementation we are talking about? In the Intel MPi Library the MPI::ERRORS_THROW_EXCEPTIONS is supported according to MPI standard specifications.

Best regards,
Andrey
0 Kudos
Jimmy821
Beginner
2,649 Views
I am using Intel MPI 4.0. I am running 3 instances of my application on the same computer. To test the exception handling, I forcefully terminate one instance of the application.

However, it appears that the catch block of the 2 other instances are not triggered. I use standard MPI functions such as MPI_TEST, MPI_BCAST, MPI_IRECV, MPI_SEND, MPI_PEEK.

Can I additionally check how to use the I_MPI_TCP_NETMASK flag in a configuration file. I could not include this in any way.

Thanks!
0 Kudos
jimmy82
Novice
2,649 Views
Just a quick update... I realised that I am able to catch an exception due to software error. For example, there is a mis-match between data size.

However, my objective is to catch errors due to network disconnection, or the other nodes hang abruptly. In this case, I read that there is no way because mpiexec does not trap the errors and will proceed to terminate all running processes.
0 Kudos
Dmitry_K_Intel2
Employee
2,649 Views
Hi Jimmy,

Please read clause 5 of the Reference Manual about fault tolerance - might be this is your case (or might be you are talking about check-points).
Mpiexec does not catch errors! Mpiexec aborts an application if one of the processes has been aborted because of error.

Regards!
Dmitry
0 Kudos
Reply