Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

A Bus error

Jack_S_
Beginner
1,039 Views

Hi all,

I'm running a pretty heavy MPI application (the WRF model) on Linux and get a BUS error (please see below the output for the type of the error) --- any idea how to isolate the cause (specific line of bad coding), or system/compiler/MPI related issue ?

Have you guys had an experience of catching these bugs with one of Intel's new tools ?

Thanks in advance for your perspective and experience,

Jack.

###

rsl.error.0010:[n13:mpi_rank_10][error_sighandler] Caught error: Bus error (signal 7)

0 Kudos
1 Reply
Heinrich_B_Intel
Employee
1,039 Views

Hi Jack, 

If you have also the Intel tools ITAC (Intel Trace Analyzer and collector) and Inspector XE available (part of the Cluster Studio) you may try the following analyis:

1. initialize ITAC . Code may be compiled with -g for better results showing line numbers etc. 

2. run your code with an additional flag (-check_mpi):

$ mpirun -check_mpi -n <N> ./prg.x

errors and warnings will be printed on stdout. This will just analyze the MPI usage.

For memory and threading bugs you may use Intel Inspector.

1.  initialize Inspector  . code should be compiled with -g

2. Run an analyis on the rank that is causing trouble e.g. 13. This can be done in a convenient way by using the "-gtool" flag or the I_MPI_GTOOL variable. 

export I_MPI_GTOOL="inspxe-cl --collect mi1 --result-dir mi1 :13"

run your code in the normal way. 
Please have a look at the Inspector documentation. You may simply try "inspxe-cl -h" for a start. The analysis above is just looking for mem leaks. More involved analysis can run for a long time. Please limit to a few time steps. 

best regards,
Heinrich

 

0 Kudos
Reply