I am struggling with the following problem:
  ERROR: LOCAL:EXIT:SIGNAL: fatal error
  ERROR: Fatal signal 11 (SIGSEGV) raised.
  ERROR: Signal was encountered at:
  ERROR: MPIU_Handle_obj_alloc_unsafe (/tmp/7b663e0dc22b2304e487307e376dc132.xtmpdir.nnlmpicl211.16412_32e/mpi4.32e.nnlmpibld05.20130522/dev/src/util/mem/x86_64/debug_dynamic/../../handlemem.c:353)
  ERROR: While processing:
  ERROR: MPI_Win_lock(lock_type=234, rank=2, assert=0, win=0xffffffffa0000002)
  WARNING: starting premature shutdown
I got this diagnostic message by using the Intel ITAC. In our application we do a lot of 1sided passive "put" and "get" operations across working MPI processes. The memory attached to MPI's windows is allocated via 'malloc'. The program is running on the HP SL230s compute server equipped with two Intel E5-2660 (2.2 GHz Sandybridge) processors with 8 cores each (i.e. 16 cores per compute server). The problem starts only when I start using more than 4 cores on a single node, regardless 'I_MPI_FABRICS' (=shm,dssm) setting and compiler's optimization leve used. The software stack is listed below:
I am wondering if there is a quick solution to this problem?
Thank you in advance!
Since you have the Intel® Trace Analyzer and Collector, my first recommendation is to run with -check_mpi to link to the correctness checking library. This will check your MPI calls for errors. If this doesn't find the problem, please send me the reproducer code and I can take a look at it.
Technical Consulting Engineer
Intel® Cluster Tools
I have localized the problem. The memory corruption (memory underflow) was due to wrong array indexing caused by using the MPI_ALLREDUCE function. More specifically, when our program runs on 16 cores on a single node, then the following call gives us wrong results:
MPI_Allreduce(MPI_IN_PLACE, X,N, MPI_LONG,MPI_SUM,MPI_COMM_WORLD)
However, if I preallocate array Y for receiving and accumulating results, then results are correct:
MPI_Allreduce(X,Y, N, MPI_LONG,MPI_SUM,MPI_COMM_WORLD)
Moreover, the two-step MPI_Reduce/MPI_Bcast approach works out even with the MPI_IN_PLACE used.
The typical dimension of N in our case is ~500000-800000.
In principle, I might try to check whether this problem is related to the default I_MPI_ADJUST_ALLREDUCE setting. What do you think?
With best regards,
Changing I_MPI_ADJUST_ALLREDUCE only changes the algorithm used for MPI_Allreduce. If it is related to the algorithm used, then there is another problem we need to resolve. Can you provide a reproducer code?