I'm trying to run my fftw3 mpi program compiled with icc compiler (wrapped within mpicc wrapper) on my single computer. When I was running regular (openmpi) mpirun I was able to run it with arbitrary number of nodes without extra settings. Printed ranks where different ranks as if I had several nodes although I have only one computer. That is logical nodes where created from one physical node.
However, when I started using Intel's mpiexec, I encountered problems. For example if I start program with 4 nodes, with line: mpiexec -n 4 ./a.out and I print out ranks within code, all the ranks are all 0 (0, 0, 0, 0), instead of 0, 1, 2, 3. How can I simulate 2 or 4 logical nodes on 1 physical node? I have a 2 core machine, so maybe 2 is maximum?
I know it has to do with settings in mpd.hosts file, but I don't know what to put in that file from few instructions available here on forum or anywhere else.
Thanks in advance.
Thanks for getting in touch. First off, can you tell me which version of the Intel MPI Library are you running? We're trying to move away from our previous Process Manager (PM) called MPD and into using the new one called Hydra. If you have a script called mpiexec.hydra in your bin64/ directory, can you try using that instead?
Also, I'm a little unsure what you mean by all the ranks are 0. If you run a very simply Hello World application (mpiexec.hydra -n 4 ./hello), are they all reporting rank 0? We ship some test apps under the <intel_mpi_install_dir>/test/ directory. You're welcome to compile and run one of those.
Looking forward to hearing back soon.
Thanks for thorough answer. My Intel MPI library version is 4.1 Update 1 Build 20130522. I tried hydra script and it works nicely with simple hello_world or FFTW because it didn't require MPD startup, it executed immediately.
However when printing ranks from simple hello world program they are all reporting rank of 0 again. I don't understand why this happens. I have one physical node (Pentium(R) Dual-Core CPU E6500 @ 2.93GHz × 2), but for the same hello_world regular OpenMPI mpirun prints all the ranks of however many nodes I put as a parameter n. As if Intel's mpiexec.hydra only sees one physical node, and cannot create logical nodes and reprints same rank again and again. This creates problem for me as I want to test validity and speed of my program on local machine before I submit it to cluster.
On the side note, I also have trouble compiling MKL MPI FFT. I used your example from the site, however when I try compiling with line: mpicc fftmkl_mpi.c -mkl -lm I get the following error: undefined reference to `DftiCreateDescriptorDM' etc... I inluded: <mkl.h>, <mpi.h>, "mkl_cdft.h" and "mkl_dfti.h" in my code. I suspect that I don't have some library included in my compiling line but I don't know which one. I tried to find solution on the forum but I couldn't as there are too many different threads and none points to easy solution, and I cannot try them all.
I do all this as I try to compare performances of cluster versions of FFTW and MKL FFT.
Thanks in advance for help!
Just a few questions and comments. Did you compile FFTW3 MPI wrappers (that locate in mkl/interfaces/fftw3x_cdft) using makefile?
"I try compiling with line: mpicc fftmkl_mpi.c -mkl -lm I get the following error: undefined reference to `DftiCreateDescriptorDM' etc..."
The problem here is that you use mpicc (in this case Intel MPI uses GCC as compiler and GCC does not know the option -mkl) so you have to use mpiicc in case you want to compile the example via ICC. Also please replace -mkl with -mkl:cluster since -mkl provides only non-cluster libraries.
It is very strange that Intel MPI runs your simple hello world in such way. Actually it does not matter for it how many cores you have. So for any MPI program you can run mpiexec -n 8 ./a.out and it should work fine (even if you have only one core on you machine).
Could you please provide you hello_world program and compile-run script, so that I can look into it and try to reproduce the problem?
Also, please look into the mkl/examples/fftw3x_cdft directory where there are some FFTW3 MPI examples which use MKL wrappers for FFTW3 MPI. You can just set the environment, run make in the example directory and check whether MKL works properly.
Hi Evarist and others,
I didn't compile FFTW3 MPI wrappers. I did that since you told me now. But once again problem is not with FFTW. FFTW compiles OK even with mpicc wrapper, because it doesn't need MKL MPI functions (only FFTW ones). So let's forget about FFTW for now...
I must note that I changed my original OpenMPI mpicc wrapper to use ICC instead of GCC. I forgot which file has to be changed for that but I know I did that. That was the only way I could compile my regular MPI program that uses (non cluster) MKL functions for certain tests. And that worked out for me well before completely OK. However now I want to extend the functionality and include cluster MKL functions (as well as try to run mpiexec since I don't know how MKL cluster program will behave with regular mpirun). And I encounter several problems...
First off all I don't even have mpiicc script. I tried to locate it everywhere on the system but there is none. There is only one file that is called: /usr/share/openmpi/mpiicc-wrapper-data.txt, but I don't know what to do with it. Thus I can only call mpicc (albeit internally it calls ICC as I put ICC instead of GCC there). The other problem is if I compile with mpicc (ICC inside), and put -mkl:cluster as switch I get this warning: icc: command line warning #10121: overriding '-mkl:cluster' with '-mkl'. So my mpicc wrapper that contains icc recognizes MKL option but not mkl:cluster option. Otherwise I couldn't have previously compiled my old MPI programs to use regular MKL functions anyways.
As far as why mpiexec prints out all ranks as 0 I have no clue why that happens. The only thing that comes to my mind is that mpiicc has to be used instead of mpicc for MKL MPI program (even if mpicc calls ICC internally). However I when tried to run the same hello world program with regular mpirun it worked as it should. At least in example this simple. I still have to check what FFTW3 gives out with regular mpirun compiled with mpicc containing ICC.
Long story short. I will send you my hello_world program as attachment. I compile hello world program with:
mpicc hello_world.c (compiles OK, keep in mind that ICC is called internally), and run it with:
mpiexec.hydra -n 4 ./a.out (gives all ranks 0, but if I run it with /usr/bin/mpirun then all ranks are correct).
I will send any additional files needed, but MKL FFT example file uses regular cluster MKL FFT functions copied from the example on site and I stated it's compiling line in previous post.