Coarray Fortran / Python mixed program

DataScientist · ‎02-16-2019

I am trying to understand how a Coarray Fortran DLL can be possibly called from Python. Consider the following sample Fortran module file `example_mod.f90` which is to be called from Python later:

module example_mod
    use iso_c_binding
    implicit none
#ifdef COARRAY_ENABLED
    integer :: co_int
#endif
    interface
    module subroutine sqr_2d_arr(nd, val, comm) BIND(C, NAME='sqr_2d_arr')
        !DEC$ ATTRIBUTES DLLEXPORT :: sqr_2d_arr
        integer, intent(in)     :: nd
        integer, intent(inout)  :: val(nd, nd), comm
    end subroutine sqr_2d_arr
    end interface
contains
end module example_mod

with the subroutine's implementation given in the submodule file `example_mod@sub_smod.f90` :

submodule (example_mod) sub_smod
    implicit none
contains
    module procedure sqr_2d_arr

        use mpi
        integer :: rank, size, ierr

        integer :: i, j

        call MPI_Comm_size(comm, size, ierr)
        call MPI_Comm_rank(comm, rank, ierr)
        write(*,"(*(g0,:,' '))") "Hello from Fortran MPI! I am process", rank, "of", size, ', comm:', comm

        write(*,"(*(g0,:,' '))") "Hello from Fortran COARRAY! I am image ", this_image(), " out of", num_images(), "images."
        sync all

        do j = 1, nd
            do i = 1, nd
                val(i, j) = (val(i, j) + val(j, i)) ** 2
            enddo
        enddo

    end procedure sqr_2d_arr
end submodule sub_smod

The subroutine also contains calls to MPI library for the sake of comparison with Coarray. I compile this code with the following ifort flags:

mpiifort /Qcoarray=distributed /Od /debug:full /fpp -c example_mod.f90
mpiifort /Qcoarray=distributed /Od /debug:full /fpp -c example_mod@sub_smod.f90
mpiifort /Qcoarray=distributed /Od /debug:full /fpp /dll /libs:dll /threads example_mod.obj example_mod@sub_smod.obj

Now, I have the following Python2 script which calls the generated DLL above:

#!/usr/bin/env python

from __future__ import print_function
from mpi4py import MPI


comm = MPI.COMM_WORLD
fcomm = MPI.COMM_WORLD.py2f()
print("Hello from Python! I'm rank %d from %d running in total..." % (comm.rank, comm.size))

comm.Barrier()   # wait for everybody to synchronize _here_

######################

import ctypes as ct
import numpy as np

# import the dll
fortlib = ct.CDLL('example_mod.dll')

# setup the data
N = 2
nd = ct.pointer( ct.c_int(N) )          # setup the pointer
pyarr = np.arange(0, N, dtype=int) * 5  # setup the N-long
for i in range(1, N):                   # concatenate columns until it is N x N
    pyarr = np.c_[pyarr, np.arange(0, N, dtype=int) * 5]

# call the function by passing the ctypes pointer using the numpy function:
fcomm_pt = ct.pointer( ct.c_int(fcomm) )
_ = fortlib.sqr_2d_arr(nd, np.ctypeslib.as_ctypes(pyarr),fcomm_pt)

print(pyarr)

Running this script with the following command:

mpiexec -np 4 python main.py

yields this output:

Hello from Fortran MPI! I am process 1 of 4 , comm: 1140850688
Hello from Fortran MPI! I am process 3 of 4 , comm: 1140850688
Hello from Fortran COARRAY! I am image  1  out of 0 images.
Hello from Fortran MPI! I am process 0 of 4 , comm: 1140850688
Hello from Fortran COARRAY! I am image  1  out of 0 images.
Hello from Fortran MPI! I am process 2 of 4 , comm: 1140850688
Hello from Fortran COARRAY! I am image  1  out of 0 images.
Hello from Fortran COARRAY! I am image  1  out of 0 images.
Hello from Python! I'm rank 3 from 4 running in total...
[[  0  25]
 [900 100]]
Hello from Python! I'm rank 0 from 4 running in total...
[[  0  25]
 [900 100]]
Hello from Python! I'm rank 1 from 4 running in total...
[[  0  25]
 [900 100]]
Hello from Python! I'm rank 2 from 4 running in total...
[[  0  25]
 [900 100]]

The computations performed in this set of codes is not important or relevant to the discussion here. However, I cannot understand why the MPI ranks are properly output, while the Coarray num_images() is zero for all processes. As a broader question, what is the best strategy to write a Coarray Fortran application that can be called from other languages such as Python?

IanH · ‎02-16-2019

I strongly suspect that a "coarray DLL" is not workable with the Intel implementation of coarrays.

During the startup of a coarray program (i.e. something compiled from source with a PROGRAM statement) various library routines are invoked to set up the environment for the subsequent multi-image execution. That set up won't occur if you are just invoking procedures compiled into a DLL.

Compile your code as a program proper, and have python invoke that program as a separate process.

jimdempseyatthecove · ‎02-16-2019

Try adding once-only code to your dll that calls the intel for_rtl_init_ function on the dll load, and for_rtl_finish_ on the dll unload.

Note, these calls may be required when the main program is .NOT. a Fortran PROGRAM.

Jim Dempsey

DataScientist · ‎02-16-2019

Ian, thanks. That would be a viable option. However, my Fortran application has a Python callback. I found out that there is an application "forpy" that let's you call Python from within Fortran. But that does not work for me because apparently, upon Python call from Fortran, FORPY initializes a new instance of Python, which is likely independent of the original main Python environment. If you know of any Fortran/Python callback method I'd appreciate sharing it with me here.

DataScientist · ‎02-16-2019

Jim, thanks. Your and Ian's comments are always very helpful on this forum. It seems like "for_rtl_init_" is a function that can be called from C main file:

Handlers for the Application (Project) Types

Miscellaneous Run-Time Library Routines

But I do not know where I should call this routine, inside the fortran DLL or outside in the python interpreter? Simply calling it from inside the exported subroutine does not work as ifort gives the following error:

mpiifort /Qcoarray=distributed /Od /debug:full /fpp /dll /libs:dll /threads example_mod.obj example_mod@sub_smod.obj
mpifc.bat for the Intel(R) MPI Library 2019 for Windows*
Copyright 2007-2018 Intel Corporation.

Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.0.117 Build 20180804
Copyright (C) 1985-2018 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.15.26732.1
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:example_mod.dll
-debug
-pdb:example_mod.pdb
-dll
-implib:example_mod.lib
"/LIBPATH:C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mpi\intel64\bin\..\..\intel64\lib\debug"
"/LIBPATH:C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mpi\intel64\bin\..\..\intel64\lib"
impi.lib
example_mod.obj
example_mod@sub_smod.obj
   Creating library example_mod.lib and object example_mod.exp
example_mod@sub_smod.obj : error LNK2019: unresolved external symbol FOR_RTL_INIT referenced in function sqr_2d_arr
example_mod.dll : fatal error LNK1120: 1 unresolved externals
ERROR in the compiling/linking [1120]

Do you have any suggestions on how to call it from within the submodule's subroutine?

Steve_Lionel · ‎02-16-2019

The problem is that coarray program initialization happens within the main program, including the "mpirun". I don't think the implementation is yet up to having coarray code in a DLL (without a Fortran main program.) FOR_RTL_INIT isn't going to help here. At a minimum you'd need to use /Qcoarray:single and use a separate MPI command to start the image (Python program) across the desired number of processes.

I see there is a routine for_rtl_ICAF_COINIT which obviously does some sort of initialization, but it is not documented for users calling it directly.

DataScientist · ‎02-16-2019

Thanks Steve. The /Qcoarray flag does not seems to have any effects on the output. I tried SINGLE, SHARED and DISTRIBUTED. The fortran processes are all generated properly, and so long as there is no message passing between the images, it will run fine. But that severly limits the usability of coarrays in mixed language programming (basically there is no coarray parallelization under this scenario). A large body of the software's users live in other language islands, among them Python. It would be good if the Intel team could come up with a solution or a set of guildelines about coarray mixed language programming, perhaps as a blog post. Anyways, we look very much forward to seeing more comprehensive implementation and support of Coarray Fortran by Intel.

jimdempseyatthecove · ‎02-17-2019

A King,

Does the Python program (iow the Python code) need to be distributed?

IOW does the distribution requirements only belong to the Fortran code?

If so, then consider writing a wrapper Fortran DLL that performs a SYSTEM or SYSTEMQQ to run your mpirun of an executable that performs the work currently in you DLL.

Jim Dempsey

Steve_Lionel · ‎02-17-2019

Use EXECUTE_COMMAND_LINE rather than SYSTEM/SYSTEMQQ.

Arjen_Markus · ‎02-18-2019

Just a thought: might an alternative be to create a Fortran main program that invokes the Python script via forpy, instead of directly via a Python interpreter program. That way the coarray environment is taken care of and there is only one Python interpreter running.

DataScientist · ‎02-26-2019

Thank you, Jim, Steve, Arjen. Your work-around solutions theoretically work, but I am afraid such an approach would severely limit the development of programs in Python. In essence, I am looking for a way to bridge Python's MPI4PY package and Intel's coarray Fortran implementation (which is implicit MPI).

I just stumbled upon this ifort option "-nofor-main", and I wonder if this could be of any help at the link time: https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-nofor-main

I could add this flag and rerun my tests to get an answer. But I thought there could be more to this flag that I could learn from you than by simply rerunning my own tests to get an answer. Does "nofor-main" have any effects on Coarray functionality and initializations that are reuiqred and implicitly occur in the presence of a main Fortran program?

Steve_Lionel · ‎02-27-2019

-nofor-main has no effect on Windows and will not help you.As I wrote above, coarray code currently cannot be used when the main program is not Fortran. This may be something available in the future when the Teams feature of F2018 is fully supported.