Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Mixing coarray and MPI

OP1
New Contributor II
1,141 Views

I would like to develop a code which mixes coarray language features (for their elegant simplicity when it comes to accessing/transferring data across images) and MPI features (in order to use, for instance, the cluster version of the MKL sparse solver which requires MPI - or any other MPI-base analytical library).

My workstation has two processors - 6 physical cores each; and I do have the cluster edition of Intel XE composer 16 Update 2, and MS Visual Studio 2013. This workstation is used for development of a code that, ultimately, will work on a Linux cluster and on Windows computers similar to mine (using Intel compilers and related tools, to minimize portability issues).

On this workstation, I would like to run two coarray images; one image (MPI rank) per node; each benefiting from the 6 physical cores on that node (for OpenMP purposes). The code below is linked with the impi.lib and impicxx.lib libraries.

It is not really clear at this point how the code shown below (a trivially modified version of one of the Intel examples) should be run from Visual studio (should the /QCoarray option be turned on?) - or from mpiexec.exe (or the GUI version wmpiexec.exe). I understand that the coarray implementation in Intel Fortran is based on the Intel MPI library - but beyond this things are a little fuzzy for me in terms of how the two coexist (when the code also needs MPI calls). I would assume that two "MPI_COMM_WORLD" are created, one implicitly (for the coarrays), the other one explicitly (in the code).

When I manage to launch the code I either get only 1 coarray image and multiple MPI ranks; or some kind of Hydra service error. There are probably issues of thread affinity (to prevent thread migration across nodes or cores) that I should look at - but for now it is not a priority.

Any help to get started on this would be immensely appreciated, thanks in advance!

PROGRAM MAIN
USE MPI
IMPLICIT NONE

INTEGER I, SIZE, RANK, NAMELEN, IERR
CHARACTER (LEN=MPI_MAX_PROCESSOR_NAME) :: NAME
INTEGER STAT(MPI_STATUS_SIZE)

WRITE(*,*) 'I AM IMAGE ',THIS_IMAGE(),' OUT OF ',NUM_IMAGES(),' IMAGES.'

CALL MPI_INIT (IERR)

CALL MPI_COMM_SIZE (MPI_COMM_WORLD, SIZE, IERR)
CALL MPI_COMM_RANK (MPI_COMM_WORLD, RANK, IERR)
CALL MPI_GET_PROCESSOR_NAME (NAME, NAMELEN, IERR)

IF (RANK.EQ.0) THEN
    PRINT *, 'HELLO WORLD: RANK ', RANK, ' OF ', SIZE, ' RUNNING ON ', NAME
    DO I = 1, SIZE - 1
        CALL MPI_RECV (RANK, 1, MPI_INTEGER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        CALL MPI_RECV (SIZE, 1, MPI_INTEGER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        CALL MPI_RECV (NAMELEN, 1, MPI_INTEGER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        NAME = ''
        CALL MPI_RECV (NAME, NAMELEN, MPI_CHARACTER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        PRINT *, 'HELLO WORLD: RANK ', RANK, ' OF ', SIZE, ' RUNNING ON ', NAME
    ENDDO
ELSE
    CALL MPI_SEND (RANK, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, IERR)
    CALL MPI_SEND (SIZE, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, IERR)
    CALL MPI_SEND (NAMELEN, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, IERR)
    CALL MPI_SEND (NAME, NAMELEN, MPI_CHARACTER, 0, 1, MPI_COMM_WORLD, IERR)
ENDIF

CALL MPI_FINALIZE (IERR)

END

 

0 Kudos
18 Replies
OP1
New Contributor II
1,141 Views

Another test I just performed with the following simplified code (using wmpiexec.exe, specifying two processes, and turning on /QCoarray) led to an output with 24 images... which is not a good sign...

PROGRAM MAIN
IMPLICIT NONE

WRITE(*,*) 'I AM IMAGE ',THIS_IMAGE(),' OUT OF ',NUM_IMAGES(),' IMAGES.'

END

and the output:

I AM IMAGE           10  OUT OF           12  IMAGES.
 I AM IMAGE            1  OUT OF           12  IMAGES.
 I AM IMAGE            2  OUT OF           12  IMAGES.
 I AM IMAGE            3  OUT OF           12  IMAGES.
 I AM IMAGE            4  OUT OF           12  IMAGES.
 I AM IMAGE            5  OUT OF           12  IMAGES.
 I AM IMAGE            6  OUT OF           12  IMAGES.
 I AM IMAGE            7  OUT OF           12  IMAGES.
 I AM IMAGE            8  OUT OF           12  IMAGES.
 I AM IMAGE           11  OUT OF           12  IMAGES.
 I AM IMAGE           12  OUT OF           12  IMAGES.
 I AM IMAGE            9  OUT OF           12  IMAGES.
 I AM IMAGE           10  OUT OF           12  IMAGES.
 I AM IMAGE            1  OUT OF           12  IMAGES.
 I AM IMAGE            2  OUT OF           12  IMAGES.
 I AM IMAGE            6  OUT OF           12  IMAGES.
 I AM IMAGE            7  OUT OF           12  IMAGES.
 I AM IMAGE            8  OUT OF           12  IMAGES.
 I AM IMAGE            9  OUT OF           12  IMAGES.
 I AM IMAGE           11  OUT OF           12  IMAGES.
 I AM IMAGE            3  OUT OF           12  IMAGES.
 I AM IMAGE            4  OUT OF           12  IMAGES.
 I AM IMAGE            5  OUT OF           12  IMAGES.
 I AM IMAGE           12  OUT OF           12  IMAGES.

 

0 Kudos
Lorri_M_Intel
Employee
1,141 Views

I can try to explain the second one.

At run time, a program built with /Qcoarray starts one image which then uses mpiexec to spawn /Qcoarray-num-images more images.   The default value for /Qcoarray-num-images in your case is 2 processors X 6 cores per processor, or 12.

When you used wmpiexec yourself to start two of these programs, that's how 24 images started up.

Each top-level instance was unaware of the other; each knew about its own 12 images, but not the other set.

           Did that make sense?

                           --Lorri

0 Kudos
Steven_L_Intel1
Employee
1,141 Views

I haven't tried this myself, but let me take a crack at your first question regarding running this mixed MPI/Coarray program from Visual Studio.

As Lorri says, by default a coarray program effectively does its own mpirun, so you want to turn that off. To do that you'll need to use the variant option /Qcoarray:single. There isn't a project property for this so you'll have to type it in under Command Line > Additional Options (and don't turn on the Coarray project property.) I'm not sure if you are supposed to do your own MPI_INIT - I'll ask Lorri to clarify as I'm a bit fuzzy on this part.

Now you'll need to do your own mpirun (or wmpiexec or whatever). In your project properties, select Debugging. The property Command defaults to just running the executable. Modify this property to give whatever command you'd use from the command line to start the MPI program. You can use $(TargetPath) to specify the executable name.

Note that debugging MPI/Coarray programs in the VS debugger is difficult. It CAN be done but it's complex and I don't want to get into it right now.

0 Kudos
Lorri_M_Intel
Employee
1,141 Views

When you use /Qcoarray:single the mechanism I described is bypassed, and only one image/process is started up.

There is still an MPI_Init call done (that's built into our underpinning-support for coarrays).   If you do another MPI_Init call, the Intel MPI code understands that one call was done by our coarray-support, and does not error-out when it sees your call.

               --Lorri

 

 

 

0 Kudos
OP1
New Contributor II
1,141 Views

Thanks Steve and Lorri for your help - I managed to go one step further with your suggestion of using \Qcoarray:single.

The following trivial code works as intended:

PROGRAM MAIN
USE MPI
USE OMP_LIB
IMPLICIT NONE

INTEGER I, SIZE, RANK, NAMELEN, IERR
CHARACTER (LEN=MPI_MAX_PROCESSOR_NAME) :: NAME
INTEGER STAT(MPI_STATUS_SIZE)

WRITE(*,*) 'I am image ',THIS_IMAGE(),' out of ',NUM_IMAGES(),' images, and I can span ',OMP_GET_MAX_THREADS(),' threads.'

END

The output is, as expected:

 I am image            2  out of            2  images, and I can span            6  threads.
 I am image            1  out of            2  images, and I can span            6  threads.

Now when adding the remainder of the sample code (with the call to MPI_INIT and a few trivial MPI calls), I run it I get the following message: "ERROR: Error while connecting to host, No connection could be made because the target machine actively refused it."
I am trying to understand what makes the (implicit) MPI initialization work for coarrays; whereas the explicit call to MPI_INIT fails.

Here is the code:

PROGRAM MAIN
USE MPI
USE OMP_LIB
IMPLICIT NONE

INTEGER I, SIZE, RANK, NAMELEN, IERR
CHARACTER (LEN=MPI_MAX_PROCESSOR_NAME) :: NAME
INTEGER STAT(MPI_STATUS_SIZE)

WRITE(*,*) 'I am image ',THIS_IMAGE(),' out of ',NUM_IMAGES(),' images, and I can span ',OMP_GET_MAX_THREADS(),' threads.'

CALL MPI_INIT (IERR)

CALL MPI_COMM_SIZE (MPI_COMM_WORLD, SIZE, IERR)
CALL MPI_COMM_RANK (MPI_COMM_WORLD, RANK, IERR)
CALL MPI_GET_PROCESSOR_NAME (NAME, NAMELEN, IERR)

IF (RANK.EQ.0) THEN
    PRINT *, 'HELLO WORLD: RANK ', RANK, ' OF ', SIZE, ' RUNNING ON ', NAME
    DO I = 1, SIZE - 1
        CALL MPI_RECV (RANK, 1, MPI_INTEGER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        CALL MPI_RECV (SIZE, 1, MPI_INTEGER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        CALL MPI_RECV (NAMELEN, 1, MPI_INTEGER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        NAME = ''
        CALL MPI_RECV (NAME, NAMELEN, MPI_CHARACTER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        PRINT *, 'HELLO WORLD: RANK ', RANK, ' OF ', SIZE, ' RUNNING ON ', NAME
    ENDDO
ELSE
    CALL MPI_SEND (RANK, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, IERR)
    CALL MPI_SEND (SIZE, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, IERR)
    CALL MPI_SEND (NAMELEN, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, IERR)
    CALL MPI_SEND (NAME, NAMELEN, MPI_CHARACTER, 0, 1, MPI_COMM_WORLD, IERR)
ENDIF

CALL MPI_FINALIZE (IERR)

END

It does not matter whether I launch it with mpiexec.exe or wmpiexec.exe (in administrator mode - on Win 7 64-bit maching). It sounds as if the 'permissions' to access the nodes are set properly in the implicit initializations; but not in the explicit call (not sure if I can word this better with my current knowledge).
Maybe this is a behavior that you can reproduce on your side?

0 Kudos
Steven_L_Intel1
Employee
1,141 Views

It's not really "implicit", but rather the Fortran run-time library does its own MPI_INIT as part of the program startup.

Are you linking to the Intel MPI we provide?

0 Kudos
OP1
New Contributor II
1,141 Views

Steve, yes, I am linking the Intel MPI: "Additional Library Directories" is set to "C:\Program Files (x86)\IntelSWTools\mpi\5.1.3.180\intel64\lib" and "Additional Dependencies" is set to "impi.lib impicxx.lib".

Compilation is done within Visual Studio as usual - with the /Qcoarray:single option.

0 Kudos
Steven_L_Intel1
Employee
1,141 Views

Here's what I tried. I note that it prompted me for login info - maybe there's a way for you to supply that when you run from VS. I suggest getting what you want working from the command line first.

C:\Projects>ifort /Qcoarray:single /Qopenmp U622116.f90 impi.lib
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R
) 64, Version 16.0.2.180 Build 20160204
Copyright (C) 1985-2016 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.00.23506.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:U622116.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
U622116.obj
impi.lib

C:\Projects>mpiexec -n 4 U622116.exe
User credentials needed to launch processes:
account (domain\user) [xxx\xxx]:
password:
 I am image            1  out of            4  images, and I can span
           2  threads.
 I am image            2  out of            4  images, and I can span
           2  threads.
 I am image            3  out of            4  images, and I can span
           2  threads.
 I am image            4  out of            4  images, and I can span
           2  threads.
[mpiexec@SBLIONEL-DESK2] ..\hydra\pm\pmiserv\pmiserv_cb.c (781): connection to p
roxy 0 at host SBLIONEL-DESK2 failed
[mpiexec@SBLIONEL-DESK2] ..\hydra\tools\demux\demux_select.c (103): callback ret
urned error status
[mpiexec@SBLIONEL-DESK2] ..\hydra\pm\pmiserv\pmiserv_pmci.c (500): error waiting
 for event
[mpiexec@SBLIONEL-DESK2] ..\hydra\ui\mpich\mpiexec.c (1130): process manager err
or waiting for completion

I don't know what to make of the errors, since it obviously died at the MPI_INIT. I don't really know MPI at all. If I take out the use of THIS_IMAGE and compile without /Qcoarray, then the MPI part works fine. If I take out the MPI_INIT, then I get errors saying I haven't done MPI_INIT.

I'm going to take this and send it on to the developers since it does not seem to be working in quite the way intended. Issue ID DPD200408911.

0 Kudos
OP1
New Contributor II
1,141 Views

Yes Steve - I observed the same behavior as you did, regarding getting either the MPI part or the coarray part individually, but not together. Thanks for asking and digging further into this.

Our development strategy is to stay entirely within the Intel suite of tools (compiler, MKL and MPI libraries) - since their features and performance meet our requirements, with the huge advantage of having only one code to maintain for Windows and Linux platforms, with minimum portability issues. Even though our code will be run on a large Linux cluster most of the time, having the option to do very lightweight stuff on Windows (and develop there) is a significant added benefit.

0 Kudos
OP1
New Contributor II
1,141 Views

Not sure that this adds much to the discussion here, but I can report that this example compiles and run as expected on a Cray (with the Cray Fortran compiler) - I just tested it and got the desired output.

I have the feeling that we must be overlooking something fairly simple to make it work on a Windows platform. Any news from the Intel developers on that topic?

0 Kudos
Steven_L_Intel1
Employee
1,141 Views

No news yet - sorry.

0 Kudos
OP1
New Contributor II
1,141 Views

OK - some news here: I think I managed to make it work. I think there is indeed a little bug somewhere in how the IVF 16 Update 2 compiler links the MPI libraries when using coarrays.

First, here is my test code (a few trivial changes compared to the above codes):

PROGRAM MAIN
USE MPI
USE OMP_LIB
IMPLICIT NONE

INTEGER(KIND=4)                       :: I, SIZE, RANK, NAMELEN, IERR
CHARACTER(LEN=MPI_MAX_PROCESSOR_NAME) :: NAME
INTEGER(KIND=4)                       :: STAT(MPI_STATUS_SIZE)
INTEGER(KIND=4)                       :: IMAGE_NUMBER,TEMP
LOGICAL :: MPI_IS_INITIALIZED

TEMP = 0
IMAGE_NUMBER = THIS_IMAGE()+TEMP

WRITE(*,*) 'I am image ',THIS_IMAGE(),' out of ',NUM_IMAGES(),' images, and I can span ',OMP_GET_MAX_THREADS(),' threads.'

CALL MPI_INITIALIZED(MPI_IS_INITIALIZED,IERR)
IF (.NOT.MPI_IS_INITIALIZED) THEN
    CALL MPI_INIT (IERR)
END IF

CALL MPI_COMM_SIZE (MPI_COMM_WORLD, SIZE, IERR)
CALL MPI_COMM_RANK (MPI_COMM_WORLD, RANK, IERR)
CALL MPI_GET_PROCESSOR_NAME (NAME, NAMELEN, IERR)

IF (RANK.EQ.0) THEN
    PRINT *, 'HELLO WORLD: RANK ', RANK, ' OF ', SIZE, ' RUNNING ON ', NAME, ' ASSOCIATED WITH IMAGE ',IMAGE_NUMBER
    DO I = 1, SIZE - 1
        CALL MPI_RECV (RANK, 1, MPI_INTEGER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        CALL MPI_RECV (SIZE, 1, MPI_INTEGER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        CALL MPI_RECV (NAMELEN, 1, MPI_INTEGER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        NAME = ''
        CALL MPI_RECV (NAME, NAMELEN, MPI_CHARACTER, I, 1, MPI_COMM_WORLD, STAT, IERR)
        CALL MPI_RECV (IMAGE_NUMBER,1,MPI_INTEGER,I,1,MPI_COMM_WORLD,STAT,IERR)
        PRINT *, 'HELLO WORLD: RANK ', RANK, ' OF ', SIZE, ' RUNNING ON ', NAME, ' ASSOCIATED WITH IMAGE ',IMAGE_NUMBER
    ENDDO
ELSE
    CALL MPI_SEND (RANK, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, IERR)
    CALL MPI_SEND (SIZE, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, IERR)
    CALL MPI_SEND (NAMELEN, 1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, IERR)
    CALL MPI_SEND (NAME, NAMELEN, MPI_CHARACTER, 0, 1, MPI_COMM_WORLD, IERR)
    CALL MPI_SEND (IMAGE_NUMBER,1, MPI_INTEGER, 0, 1, MPI_COMM_WORLD, IERR)
ENDIF

CALL MPI_FINALIZE (IERR)

END

Using depends.exe I was able to look at the dependencies of this code, when turning on/off the /Qcoarray:single option, and it seems that there was a conflict between the MPI library required by libicaf.dll (impimt.lib) and the MPI library linked in the code (impi.lib).

So, for the compile line I simply specified /Qcoarray:single ,as indicated by Steve above. The differences are for the link command. I explicitly specified "impimt.lib" as Additional Dependencies (instead of Steve's suggestion of impi.lib and impicxx.lib), and I set "Link Library Dependencies" to NO.

So, Steve, I think you can report this to your developers - it seems that there is a little bug in the selection of linked libraries that could easily be fixed. I am going to try and see if I can figure out the parameters needed for a debug configuration.

Also - another possible bug: it seems as if the call to MPI_FINALIZE that is done implicitly by the libicaf library does not check that MPI has not already been terminated by the user (as on line 45 of the code above). It means that when the code is run, we get these messages "Attempting to use an MPI routine after finalizing MPI". Again, this should be an easy fix.

Let me know if I did something horrendous here - it does seem to work, but maybe things behind the scenes are not as rosy as they seem now?

0 Kudos
OP1
New Contributor II
1,141 Views

Quick comment: setting "Link Library Dependencies" to NO is not necessary. But you need to specify the location of the impimt.lib library (Linker > General > Additional Library Directories = C:\Program Files (x86)\IntelSWTools\mpi\5.1.3.180\intel64\lib\release_mt)

0 Kudos
Steven_L_Intel1
Employee
1,141 Views

Interesting - but it's not quite that straightforward.

A coarray application doesn't link directly to any MPI library. It does link to libicaf.lib which is an export library for libicaf.dll. We don't provide coarray support except in DLL form. libicaf.dll in turn is linked to libimpimt.lib which is the export library for libimpimt.dll. So what happens when you link to impi.lib, which is the static library, you get MMLS (Multiple MPI Library Syndrome) which is analogous to the classic MCLS (Multiple C Library Syndrome), with very similar effects in that one side of MPI doesn't know the other side exists and doesn't talk to it.

You have the correct solution - link to impimt.lib instead of impi.lib, and I apologize for leading you astray. However, I am left somewhat puzzled as I know that we have customers using our coarray implementation with non-Intel MPI libraries and I am at a loss to understand how that works. I will have to ask our experts in that area.

0 Kudos
OP1
New Contributor II
1,141 Views

For an application built with the Debug option, libicaf.dll still refers to impimt.dll (I was hoping it would depend on impidmt.dll). Should I then still link with impimt.dll to avoid MMLS :) ?

 

0 Kudos
Steven_L_Intel1
Employee
1,141 Views

We have only one libicaf - it can't link against different DLLs. Yes, I would recommend continuing to link with impimt.lib (.dll). I see also that I was mistaken about impi.lib being a static library - it is the import library for impi.dll. The difference is that impi.dll assumes one thread per rank and impimt.dll is thread-safe. So the general recommendation would be to link to impimt.lib.

Even with your revised program and linking to impimt.lib I can't get this to work. I get this:

[mpiexec@SBLIONEL-DESK2] ..\hydra\pm\pmiserv\pmiserv_cb.c (781): connection to p
roxy 0 at host SBLIONEL-DESK2 failed
[mpiexec@SBLIONEL-DESK2] ..\hydra\tools\demux\demux_select.c (103): callback ret
urned error status
[mpiexec@SBLIONEL-DESK2] ..\hydra\pm\pmiserv\pmiserv_pmci.c (500): error waiting
 for event
[mpiexec@SBLIONEL-DESK2] ..\hydra\ui\mpich\mpiexec.c (1130): process manager err
or waiting for completion

I am not an MPI expert so wonder if I am missing something,.

0 Kudos
OP1
New Contributor II
1,141 Views

Steve, here is how I compile/build the code:

/nologo /O2 /I"C:\Program Files (x86)\IntelSWTools\mpi\5.1.3.180\intel64\include" /Qopenmp /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc120.pdb" /libs:dll /threads /Qmkl:cluster /c /Qcoarray:single

and

/OUT:"x64\Release\MPI_TEST.exe" /INCREMENTAL:NO /NOLOGO /LIBPATH:"C:\Program Files (x86)\IntelSWTools\mpi\5.1.3.180\intel64\lib\release_mt" /MANIFEST /MANIFESTFILE:"x64\Release\MPI_TEST.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /IMPLIB:"D:\Users\poudouol\Documents\02_FRITOS\v2.0\Codes\Sandbox\MPI_TEST\x64\Release\MPI_TEST.lib" impimt.lib

I hope this helps. Your recommendation about changing the Debugging/Command and Debugging/Command Arguments also works well to launch within the Visual Studio (2013) environment:

Command = C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016.2.180\windows\mpi\intel64\bin\mpiexec.exe
Command Arguments = -n 4 $(TargetPath)

I hope this helps!

Also, I still get the "Attempting to use an MPI routine after finalizing MPI" warning message upon exiting the code - as I said I think this is a simple fix on the Intel MPI implementation side (making sure that MPI has not been finalized already).

I am going to play now with a Debug configuration, and see how much is possible there (without resorting to the true and tested debug strategy that consists in WRITE statements...).

 

 

0 Kudos
Steven_L_Intel1
Employee
1,141 Views

Yes, we recognize the MPI_FINALIZE issue and I've asked the developers to deal with that.

0 Kudos
Reply