I am developing a programusing TBB (v4.0 update 1), MPI (mpich2-1.4.1p1), and TBB's scalable memory allocation library.
The program is segfaulting inside TBB's malloc library, but the program works OK if I do not use TBB's scalable memory allocation library or if I do not use MPI library (the bug occurseven whenI do not use any MPI routine except for MPI_Init_thread). The bug also does not happen if I use OpenMP instead of TBB for parallelization (using MPI, OpenMP, and TBB's malloc library is OK).
The bug is a hisenbug and does not happen consistently, but the bug appears more frequently with a larger number of MPI nodes and appears almost every time with more than 40 MPI nodes in my test environment.
Is there any known inter-operability issue among MPI, TBB, and TBB's memory allocator?
I attached a simplifed version of cpp source code to reproduce the bug (though this bug is a hisenbug and I am not sure the bug will appear in different systems) and gdb call stack printfrom the original version of program. The simple version also segfaults at the same point (../../src/tbbmalloc/backref.cpp:158) but I cannot print call stack as the process becomes a zombie process after segmentation fault.
* the simple version of cpp source code (compiled with -std=c++0x -g -O1 -openmp and used link options -openmp-link static -openmp -ltbb_debug -ltbbmalloc_porxy_debug -ltbbmalloc_debug -lgfortran -lmpich -lmpl -lpthread)