Using a custom built mpiexec wrapper

Geoff_Hall · ‎01-30-2012

Hi,

I would like to develop an application that looks, in part, like wmpiexec. That is, it starts multiple processes and monitors their output, but in a more sophisticated manner than wmpiexec (which just displays the processes' output in a window). Are there any tools, tricks, documentation out there that will help me shortcut the process?

TIA

Cheers, Geoff

James_T_Intel · ‎01-31-2012

Hi Geoff,

First, what exactly are you wanting to do with your wrapper? Based on what you're saying, I'd guess you're wanting to work more with the output from each of the processes. I would recommend reading through the Intel MPI Library Reference Manual. You should have a copy installed on your computer at \\doc\Reference_Manual.pdf to read. The options available for mpiexec are listed here, and several of these give additional output which can be useful.

If you are wanting to finely control how the processes are run, this is easily done via mpiexec configuration files. I would recommend this for your situation, as you are running on a heterogeneous system. You can build a single configuration file and use that as a template for each application you need to run.

If you are wanting something beyond the capabilities of mpiexec, I would recommend specifically building them into your program. If it is something you expect to do frequently, a static or dynamic library would work well.

As I stated in the beginning, it really comes down to what you plan to do with your wrapper.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Geoff_Hall · ‎01-31-2012

Hi James,

Thanks for your thoughts. I have crawled up and down the reference manual and had decided that I didn't need most of what was described there.

So, I should let you know where I'm coming from and where I'd like to get to.

We have a program (call it B) that reads an instruction file for its tasks. Already defined.
These tasks are independent, and rather than have one program B execute all the tasks one after the other, we can use mpiexec to start multiple programs B running on multiple machines in parallel, each running an individual task. That part is relatively easy and mpiexec seems ideal for the job of firing off those programs. I want the program that fires off mpiexec (call it A) to be able to read the outputs that come back from the multiple running programs B to be able to display some sort of task progress. (i.e. program A does not 'control' programs B other than to set up the task file appropriately.)

The idea seems simple enough, but in my little tests recently I have been having trouble getting program A to open the "console output" so that it can read the data coming back from the programs B. The trouble being that as soon as I open a file for reading (even 'shared') the programs B can't write to that file. I guess I'm not understanding how to get access to the output from the running programs B - while they're still running.

I know it can be done because wmpiexec does it! I'm just looking for the clue that will set me in the right direction. ... Please ... :)

TIA

Cheers, Geoff

James_T_Intel · ‎02-01-2012

Hi Geoff,

Ok, there are several things you'll want to consider. First, while MPI is certainly suitable for the task, I would not consider MPI to be the best tool for the task. Here's a scenario. Let's say your program A launches 10 copies of B. I don't know how long B runs, but let's say it runs for 30 minutes. Now, instance 7 of B runs into a problem and crashes 5 minutes into the run. Unless you have designed it to be fault tolerant, every instance of B will now abort. Fault tolerance is present in MPI, but it is easier using other methods.

Second, wmpiexec is simply displaying the standard output from mpiexec, nothing more. Whena program launches a process, it has access (if set up to do so) to the process's standard input/output/error streams. Generally,if standard output of a child program is not handled by the parent program, it is automatically sent to the parent's standard output. In the case of wmpiexec, it is designed to take the output from mpiexec and display it in the window you see.

If you are set on using MPI to do this, here is what I would recommend. First, rather than having A launch mpiexec, have A be one of the programs called from mpiexec. Then, implement them as a master/worker model, A as master, each instance of B as a worker. A will then generate the input for B and send it over MPI to each instance of B. Rather than writing to standard output or to a file, have B send the output to A. A can determine the message source and parse it appropriately. As I mentioned before, you'll want some fault tolerance so that if an instance of B would crash, it instead sends an appropriate message to A and waits at the end for the others to complete. You would need to generate a configuration file for MPI beforehand, but that can be done once, and modified if necessary later.

Instead, I would recommend looking at other options. You can set up a server/client program that can communicate across the computers, you could use native Windows* capabilities (WMI, PowerShell), or you could use third-party approaches (ssh). These should give you a few ideas to get started.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel Cluster Tools

Geoff_Hall · ‎02-01-2012

Thanks for those comments James.

I had considered firing off A and multiple B with mpiexec, then gone for what I thought would be a much simpler solution. You've thrown up a few more things to think about, in particular, the fault tolerant ability that will require more thought on my part.

I'll need to reconsider my options. Once again, thanks.

Cheers, Geoff