Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

mixing Intel MPI and TBB

Hyokun_Y_
Beginner
886 Views

I have been using a mixture of MPICH2 and TBB very successfully:
MPICH2 for machine-to-machine communication and TBB for inter-machine thread management.

Now, I am trying the very same code in the system which uses Intel MPI instead of MPICH2,
and I am observing a very odd behavior; some messages sent with MPI_Ssend is not being received
in the destination, and I am wondering whether it is because Intel MPI and TBB does not work well
together.

The following document

http://software.intel.com/en-us/articles/intel-mpi-library-for-linux-product-limitations

says the environment variable I_MPI_PIN_DOMAIN has to be set properly when 
OpenMP and Intel MPI are used together; when TBB instead of OpenMP is used with
Intel MPI, is there anything I should be careful about? Is this combination
guaranteed to work?

Thanks,
Hyokun Yun 

 

0 Kudos
5 Replies
Hyokun_Y_
Beginner
886 Views

I have attached a simple test program which mixes TBB with Intel MPI. This worked perfectly fine in the previous cluster which uses MPICH2, but in a new cluster with Intel MPI, some messages are never delivered and thus blocking-send never completes.

0 Kudos
Hyokun_Y_
Beginner
886 Views

[cpp]

#include <iostream>
#include <utility>

#include "tbb/tbb.h"
#include "tbb/scalable_allocator.h"
#include "tbb/tick_count.h"
#include "tbb/spin_mutex.h"
#include "tbb/concurrent_queue.h"
#include "tbb/pipeline.h"
#include "tbb/compat/thread"
#include <boost/format.hpp>

using namespace std;
using namespace tbb;


int main(int argc, char **argv) {

// initialize TBB
tbb::task_scheduler_init init();

// initialize MPI
int numtasks, rank, hostname_len;
char hostname[MPI_MAX_PROCESSOR_NAME];

int mpi_thread_provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &mpi_thread_provided);

if (mpi_thread_provided != MPI_THREAD_MULTIPLE) {
cerr << "MPI multiple thread not provided!!! " << mpi_thread_provided << " " << MPI_THREAD_MULTIPLE << endl;
return 1;
}

MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Get_processor_name(hostname, &hostname_len);

cout << boost::format("processor name: %s, number of tasks: %d, rank: %d\n")
% hostname % numtasks % rank;


// run program for 10 seconds
double RUN_SEC = 10;
// size of message
int MBUFSIZ = 100;

tick_count start_time = tick_count::now();

// receive thread: keep receiving messages from any sources
thread receive_thread([&]() {
int monitor_num = 0;
double elapsed_seconds;

int data_done;
MPI_Status data_status;
MPI_Request data_request;

char recvbuf[MBUFSIZ];

MPI_Irecv(recvbuf, MBUFSIZ, MPI_CHAR,
MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &data_request);

while(true) {
elapsed_seconds = (tbb::tick_count::now() - start_time).seconds();

if (monitor_num < elapsed_seconds + 0.5) {
cout << "rank: " << rank << ", receive thread alive" << endl;
monitor_num++;
}

if (elapsed_seconds > RUN_SEC + 5.0) {
break;
}

MPI_Test(&data_request, &data_done, &data_status);
if (true == data_done) {
cout << "rank: " << rank << ", message received!" << endl;
MPI_Irecv(recvbuf, MBUFSIZ, MPI_CHAR,
MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &data_request);

}

}

MPI_Cancel(&data_request);

cout << "rank: " << rank << ", recv thread dying!" << endl;

return;
});

// send thread: send one (meaningless) message to (rank + 1) every second
thread send_thread([&]() {
int monitor_num = 0;
double elapsed_seconds;

char sendbuf[MBUFSIZ];
fill_n(sendbuf, MBUFSIZ, 0);

while (true) {
elapsed_seconds = (tbb::tick_count::now() - start_time).seconds();

if (monitor_num < elapsed_seconds) {
cout << "rank: " << rank << ", start sending message" << endl;
monitor_num++;

MPI_Ssend(sendbuf, MBUFSIZ, MPI_CHAR,
(rank + 1) % numtasks, 1, MPI_COMM_WORLD);

cout << "rank: " << rank << ", send successfully done!" << endl;

}

if (elapsed_seconds > RUN_SEC) {
break;
}
}

cout << "rank: " << rank << ", send thread dying!" << endl;

return;
});

receive_thread.join();
send_thread.join();

return 0;

}

[/cpp]

0 Kudos
Roman_L_Intel
Employee
886 Views

Hello Hyokun Yun,

Thank you for reporting the issue.

The issue is apparently TBB-unspecific, the same behavior is reproduced with pure pthreads (please refer to the attachment).

Interestingly though, the same logic implemented with pure MPI ranks or one thread per rank produce correct results with Intel MPI.

Thus, something specific is with multi-threaded mode of IMPI. Let me check with the product team and get back with more details.

Thank you,

Roman

0 Kudos
Roman_L_Intel
Employee
886 Views

The issue pertaining to multi-threading has apparently been already resolved as part of another fix made recently for Intel MPI Library. So the fix should be delivered via the next public Intel MPI release.

So please stay tuned.

0 Kudos
Hyokun_Y_
Beginner
886 Views

Hi Roman,

Thanks for the quick and very helpful response! You saved my life.

I am wondering whether there is any web page regarding this problem, or a website to download a hotfix? Apparently there is little possibility of workaround here, and I cannot stop working and just wait for a new release... 

0 Kudos
Reply