I just upgrade Intel MPI for Windows from 3.0.012 to 4.0.0.011. After I upgrade, I can run parallel case in a single node without problem. If I run parallel case cross multi nodes, my program always stopped. I debug the running status, processes were started up at shm data transfer mode. If I set I_MPI_FABRICS to shm:tcp, program also stopped. If I set I_MPI_FABRICS to tcp, program can run. If I set I_MPI_FABRICS to dapl and set I_MPI_FALLBACK to enable, program can run. But that is not want I want. We are developing a commercial software, we want MPI can select fabrics automatically, our users may not know the detail to set those environment variables. The problem happed on both Windows XP 64bit and Windows 7 64bit version. Does anyone meet the same problem? Thanks,
Could you check version of smpd running on your nodes? You get the information running the following command in a command window:
smpd -get binary
It shoould be from version 4.0.
Could you also check old environment - might some env variables left from the previous version. At least there should be no I_MPI_DEVICE.
Running Intel MPI by default fallback (I_MPI_FALLBACK) is enabled so the library will check all existing fast fabrics and if they are not available fallbacks to tcp. You can see what fast fabric has been selected by setting I_MPI_DEBUG=2 (or higher).
When you set I_MPI_FABRICS=shm:tcp and everything works just fine that means that something prevents to run in the same way in default mode.
BTW: You can upgrade 4.0.0.011 to 4.0.1 (and very soon to 4.0.2)
Could you run your program like:
mpiexec -genv I_MPI_DEBUG 5 -genv I_MPI_FABRICS shm:tcp -hosts [your hosts and number of processes] ./app_name
And place the output here.
Could you try the following command:
mpiexec -wdir "mydir" -genv I_MPI_DEBUG 5 -genv I_MPI_PLATFORM 0 -genv I_MPI_FABRICS shm:tcp -hosts 2 gems3 3 gems4 2 -pwdfile "mypassword" "myfile"
If I_MPI_PLATFORM doesn't help please download Intel MPI Library version 4.0 Update 1 and give it try. Remember that it should be updated on all nodes.
"Third case, I ran without -genv I_MPI_FABRICS shm:tcp, and I set host name to the two different name, gem3 and gems4. The output is below.
mpiexec -wdir "mydir" -genv I_MPI_DEBUG 5 -hosts 2 gems3 3 gems4 2 -pwdfile "mypassword" "myfile"
 MPI startup(): shm data transfer mode
 MPI startup(): shm data transfer mode"
You don't need any other library - everything should work fine.
Do you use a script to run your application? Might be you make some settings there?
It's not clear why these variables are in the list:
Could you please compile your program (you can compile HelloWorld example from the test directory instead) with debug information and run it with I_MPI_FABRICS=shm:tcp on 2 nodes with I_MPI_DEBUG=50.
Please send me only lines with "business card" in them.
It looks like gems3 and gems4 are considered to have the same ip address. Could you please please check that they have different ip addresses?
Well, it seems to me that you are using computer names without DNS suffix. Please check this suffix in "My Computer"-> System Properties->Computer Name (Tab)->"Change..." button.
Full computer name should have DNS suffix. If it's not so (computer name looks like 'gems3'), please press "More..." button and type a suffix in the "Primary DNS suffix" field.
If you don't have domain name you can try to use 'local'.
You need to do this on each computer you are going to use.
Please do it and try to run a program with default parameters.
this is not a requirement but sometimes it works in unexpected way if there is no DNS suffix. We are investigating the issue. For now just add a suffix - nothing else will be needed.
This issue will be fixed in the upcoming 4.0 Update 3 release which should be available for customers sometime in November. I hope that this fix will resolve inconsistency between different implementations of MPI.