I have a very large external package I would like to use in my application. The package runs in parallel but only under MPI. To have users start the application under "mpirun ..." is impractical.
However, there is a call - MPI_Init_thread - that appears equivalent to MPI_Init. So here's my question: can I continue building the application as I normally do then, say, on Windows, start multiple threads with _beginthreadex and, in each of the threads call MPI_Init_thread? In other words, would all subsequent calls to MPI behave as if I had a true multiprocessor run, except that functions such as MPI_Comm_rank would return the rank within this multi-threaded application, rather than in a multi-processor application?
I'm trying to use this external package but will only run it on multi-core machines as multi-threaded. I'm hoping that all message passing will now be optimized for shared memory and I can use MPI transparently in a single processor application.
In short, no. This will not do what you want. MPI_Init_thread (or MPI_Init for that matter) can only be called once per rank. The purpose of MPI_Init_thread is to allow (subsequent) MPI calls within threaded regions to work correctly. You could use MPI with only one rank to send data between threads in this scenario. A single rank MPI program (singleton) can still use MPI calls, but it won't necessarily be able to see other ranks. You couldn't for instance launch two completely independent copies and have them communicate with each other (there are ways around this, involving MPI_Comm_join, but that's a separate discussion).
If you wanted, you could have one thread send data to another using MPI_Send/MPI_Recv. All threads will have the same rank, but you can take advantage of the tag to ensure data goes where it should. I don't know how any of the collectives would work in this scenario, I haven't tested it before.
Technical Consulting Engineer
Intel® Cluster Tools
Thanks for the very quick reply. I'm not sure I can count this external package using different tags; it may just ignore the tag and count on only the rank for communication.
So this brings up the question: is there any, say, OpenMP-only emulation of MPI for one processor, multi-threaded applications such as I described, where rather than a processor per rank it uses a thread per rank and everything is running from within a single executable?
The tags will work if implemented correctly. MPI_Send and MPI_Recv must have matching tags in order to transfer data. The tag is frequently just set to 0.
To my knowledge, there is no MPI implementation that only uses one rank, as that is counter to the whole point of MPI. There might be something like that, but not that I have seen.