I started a thread on this topic on Oct 25th, 2011 but I am unable to see that thread now on this forum. I searched the forum but couldn't find it anywhere. I received two replies to my post in my email account as given below:
Hi Nikhil,Well, this is a really specific case - it would be nice if you could explain why you cannot use shared memory. Might be we need to fix this issue instead of performance degradation with ofa.OFA fabric has its own settings and they were tuned
Have you tried running the mpitune utility on your scenario?Although shared memory is the best choice for your setup, the tool may help you to identify MPI parameters that need to be modified as you are changing the usual assumptions regarding environment
My answers to these replies:
1. (reply 1): The two processes which are launched on the node (say control node) get dispatched to different nodes (say compute node) for actual execution (i.e. it's virtual execution on control node). There are two scenarios here:
(a) When both of the processes are dispatched to the same compute node, both "shm" or "ofa" work. But as I mentioned earlier, "ofa" runs quite slow.
(b) When these two processes are dispatched to two different compute nodes (one process to each compute node), then only "ofa" would work. And I get the same performance numbers as in (a) with "ofa" value.
2. (reply 2): I tried mpitune but it returns error as below:27'Oct'11 18:37:33 WRN | Invalid default value ('/home/vertex/config.xml') of argument ('config-file').
Original thread:I am using Intel MPI version 4.0.2.003 on CentOS 5.6 64 bit platform. I am using
IMB-MPI1 (pallas) benchmark on this platform. I have set the I_MPI_FABRICS=ofa (in other words I need to force the use of OFED for communication between MPI processes).
When I run as: "mpiexec -n 2 IMB-MPI1" it launches two processes on the node.
For some particular reason specific to my environment I can not use shared memory for I_MPI_FABRICS. Though the IMB-MPI1 benchmark suite runs fine, I am receiving almost 40% less performance numbers when I run the same suite using OpenMPI (without shared memory). Of course when I use I_MPI_FABRICS=shm when the processes are executing on the same node, I get very high performance numbers.
My question is: Is there a "loopback" mode in Intel MPI which I can try for the processes running on the same node ? Or is there any specific tuning parameter that I can use ?