Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Using Intel MPI from .NET

Jon_Harrop
Beginner
747 Views
I wrote a simple ping-pong program in F# that uses Intel MPI. The measured latencies are great (around 10us for the smallest messages) but I need to transfer this functionality over to a production system that is a large multi-threaded F# program and I'm having great difficulty doing so.
I learned that I need to use the impimt.dll library instead of the usual impi.dll one and that I must initialize MPI using MPI_Init_thread instead of the usual MPI_Init. This works but the performance is literally 100,000s times worse. I'm seeing four second latencies!
Is this to be expected?
If so, how should I use Intel MPI for my latency-critical multithreaded program? The best idea I have come up with so far is to implement a token ring using my ping-pong code, sending messages back and forth that may or may not contain data. This seems hugely wasteful but I cannot see any other way to make it work.
0 Kudos
4 Replies
Dmitry_K_Intel2
Employee
747 Views
Hi Jon,

Of cause this is unexpected behavior. It's very difficult to identify the reason of such perfomance degradation. We will try to reproduce the issue on our servers and understand the reason (if it's reproducable).
Have you tied to use C# (or C++) instead of F#? Does F# use its own mechanism (library) to create threads?

Regards!
Dmitry
0 Kudos
Jon_Harrop
Beginner
747 Views
Hi Dmitry,

I have gathered some more information. The problem only manifests when using the MPI_THREAD_MULTIPLE setting (to allow arbitrarily multi-threaded programs) and not when the program only makes calls from a single thread.

However, I have worked around the problem by creating an MPI thread, initializing with MPI_THREAD_SERIALIZED and going into an infinite loop sending messages back and forth between the two machines as fast as possible, feeding sends from a concurrent queue (sending dummy data is no real data is available) and posting received data back to my application. This way any thread can send data by enqueuing it and incoming messages can be received in any way. The performance is great in my test code, I just have to graft the new code into my production system now...

The F# standard library creates threads indirectly but only using ordinary .NET calls. I was using asynchronous agents to send and receive data over MPI.Here's the code I was using:

let send =
let agent = new MailboxProcessor<_>(fun inbox -> async {
initialize()
while true do
let! (buf : byte []) = inbox.Receive()
if buf.Length >= maxSize then
printfn "WARNING: %d-byte message is too long for MPI" buf.Length
else
let nativeArray = NativeInterop.PinnedArray.of_array buf
let dst = if rank() = 0 then 1 else 0
Internal.send(nativeArray.Ptr, buf.Length, Internal.MPI_BYTE, dst, 0, Internal.MPI_COMM_WORLD) |> check
})
agent.Start()
agent.Post
let receive =
let buf = Array.create maxSize 0uy
let nativeBuf = NativeInterop.PinnedArray.of_array buf
let status = [|Internal.MPI_Status()|]
let nativeStatus = NativeInterop.PinnedArray.of_array status
let queue = System.Collections.Concurrent.ConcurrentQueue<_>()
async {
initialize()
while true do
Internal.recv(nativeBuf.Ptr, buf.Length, Internal.MPI_BYTE, Internal.MPI_ANY_SOURCE, Internal.MPI_ANY_TAG, Internal.MPI_COMM_WORLD, nativeStatus.Ptr) |> check
let buf = Array.sub buf 0 status.[0].count
queue.Enqueue buf
} |> Async.Start
queue
My "send" is an asynchronous agent that serializes messages posted to it from any thread and sends them over MPI. My "receive" is an infinite loop that sits in a thread on the thread pool waiting for any message to be received. Note that this means there will be a call to "MPI_Recv" blocking one thread while another thread is calling "MPI_Send".
Cheers,
Jon.
0 Kudos
Dmitry_K_Intel2
Employee
747 Views
Hi Jon,

We've checked performance of mt version of the Intel MPI Library and it's just a bit (<10% for some message sizes) slower than single thread version.
You can check performance by yourself starting IMB from the installation package.
If IMB shows comparable performance of 2 libraries than probably means that's something wrong in F#'s MPI wrappers.

Regards!
Dmitry
0 Kudos
Dmitry_K_Intel2
Employee
747 Views
Hi Jon,

Could you please submit a tracker at premier.intel.com
In this case I'll be able to upload an engineering version of the Intel MPI Library to check the performance issue.

Regards!
Dmitry
0 Kudos
Reply