Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Coarray Fortran / Python mixed program

DataScientist
Valued Contributor I
589 Views

I am trying to understand how a Coarray Fortran DLL can be possibly called from Python. Consider the following sample Fortran module file `example_mod.f90` which is to be called from Python later:

module example_mod
    use iso_c_binding
    implicit none
#ifdef COARRAY_ENABLED
    integer :: co_int
  • #endif interface module subroutine sqr_2d_arr(nd, val, comm) BIND(C, NAME='sqr_2d_arr') !DEC$ ATTRIBUTES DLLEXPORT :: sqr_2d_arr integer, intent(in) :: nd integer, intent(inout) :: val(nd, nd), comm end subroutine sqr_2d_arr end interface contains end module example_mod
  • with the subroutine's implementation given in the submodule file `example_mod@sub_smod.f90` :

    submodule (example_mod) sub_smod
        implicit none
    contains
        module procedure sqr_2d_arr
    
            use mpi
            integer :: rank, size, ierr
    
            integer :: i, j
    
            call MPI_Comm_size(comm, size, ierr)
            call MPI_Comm_rank(comm, rank, ierr)
            write(*,"(*(g0,:,' '))") "Hello from Fortran MPI! I am process", rank, "of", size, ', comm:', comm
    
            write(*,"(*(g0,:,' '))") "Hello from Fortran COARRAY! I am image ", this_image(), " out of", num_images(), "images."
            sync all
    
            do j = 1, nd
                do i = 1, nd
                    val(i, j) = (val(i, j) + val(j, i)) ** 2
                enddo
            enddo
    
        end procedure sqr_2d_arr
    end submodule sub_smod

    The subroutine also contains calls to MPI library for the sake of comparison with Coarray. I compile this code with the following ifort flags:

    mpiifort /Qcoarray=distributed /Od /debug:full /fpp -c example_mod.f90
    mpiifort /Qcoarray=distributed /Od /debug:full /fpp -c example_mod@sub_smod.f90
    mpiifort /Qcoarray=distributed /Od /debug:full /fpp /dll /libs:dll /threads example_mod.obj example_mod@sub_smod.obj
    

    Now, I have the following Python2 script which calls the generated DLL above:

    #!/usr/bin/env python
    
    from __future__ import print_function
    from mpi4py import MPI
    
    
    comm = MPI.COMM_WORLD
    fcomm = MPI.COMM_WORLD.py2f()
    print("Hello from Python! I'm rank %d from %d running in total..." % (comm.rank, comm.size))
    
    comm.Barrier()   # wait for everybody to synchronize _here_
    
    ######################
    
    import ctypes as ct
    import numpy as np
    
    # import the dll
    fortlib = ct.CDLL('example_mod.dll')
    
    # setup the data
    N = 2
    nd = ct.pointer( ct.c_int(N) )          # setup the pointer
    pyarr = np.arange(0, N, dtype=int) * 5  # setup the N-long
    for i in range(1, N):                   # concatenate columns until it is N x N
        pyarr = np.c_[pyarr, np.arange(0, N, dtype=int) * 5]
    
    # call the function by passing the ctypes pointer using the numpy function:
    fcomm_pt = ct.pointer( ct.c_int(fcomm) )
    _ = fortlib.sqr_2d_arr(nd, np.ctypeslib.as_ctypes(pyarr),fcomm_pt)
    
    print(pyarr)

    Running this script with the following command:

    mpiexec -np 4 python main.py

    yields this output:

    Hello from Fortran MPI! I am process 1 of 4 , comm: 1140850688
    Hello from Fortran MPI! I am process 3 of 4 , comm: 1140850688
    Hello from Fortran COARRAY! I am image  1  out of 0 images.
    Hello from Fortran MPI! I am process 0 of 4 , comm: 1140850688
    Hello from Fortran COARRAY! I am image  1  out of 0 images.
    Hello from Fortran MPI! I am process 2 of 4 , comm: 1140850688
    Hello from Fortran COARRAY! I am image  1  out of 0 images.
    Hello from Fortran COARRAY! I am image  1  out of 0 images.
    Hello from Python! I'm rank 3 from 4 running in total...
    [[  0  25]
     [900 100]]
    Hello from Python! I'm rank 0 from 4 running in total...
    [[  0  25]
     [900 100]]
    Hello from Python! I'm rank 1 from 4 running in total...
    [[  0  25]
     [900 100]]
    Hello from Python! I'm rank 2 from 4 running in total...
    [[  0  25]
     [900 100]]

    The computations performed in this set of codes is not important or relevant to the discussion here. However, I cannot understand why the MPI ranks are properly output, while the Coarray num_images() is zero for all processes. As a broader question, what is the best strategy to write a Coarray Fortran application that can be called from other languages such as Python?

    0 Kudos
    11 Replies
    IanH
    Honored Contributor II
    589 Views

    I strongly suspect that a "coarray DLL" is not workable with the Intel implementation of coarrays.

    During the startup of a coarray program (i.e. something compiled from source with a PROGRAM statement) various library routines are invoked to set up the environment for the subsequent multi-image execution.  That set up won't occur if you are just invoking procedures compiled into a DLL.

    Compile your code as a program proper, and have python invoke that program as a separate process.

    0 Kudos
    jimdempseyatthecove
    Honored Contributor III
    589 Views

    Try adding once-only code to your dll that calls the intel for_rtl_init_ function on the dll load, and for_rtl_finish_ on the dll unload.

    Note, these calls may be required when the main program is .NOT. a Fortran PROGRAM.

    Jim Dempsey

    0 Kudos
    DataScientist
    Valued Contributor I
    589 Views

    Ian, thanks. That would be a viable option. However, my Fortran application has a Python callback. I found out that there is an application "forpy" that let's you call Python from within Fortran. But that does not work for me because apparently, upon Python call from Fortran, FORPY initializes a new instance of Python, which is likely independent of the original main Python environment. If you know of any Fortran/Python callback method I'd appreciate sharing it with me here.

    0 Kudos
    DataScientist
    Valued Contributor I
    589 Views

    Jim, thanks. Your and Ian's comments are always very helpful on this forum. It seems like "for_rtl_init_" is a function that can be called from C main file:

    Handlers for the Application (Project) Types

    Miscellaneous Run-Time Library Routines

    But I do not know where I should call this routine, inside the fortran DLL or outside in the python interpreter? Simply calling it from inside the exported subroutine does not work as ifort gives the following error:

    mpiifort /Qcoarray=distributed /Od /debug:full /fpp /dll /libs:dll /threads example_mod.obj example_mod@sub_smod.obj
    mpifc.bat for the Intel(R) MPI Library 2019 for Windows*
    Copyright 2007-2018 Intel Corporation.
    
    Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.0.117 Build 20180804
    Copyright (C) 1985-2018 Intel Corporation.  All rights reserved.
    
    Microsoft (R) Incremental Linker Version 14.15.26732.1
    Copyright (C) Microsoft Corporation.  All rights reserved.
    
    -out:example_mod.dll
    -debug
    -pdb:example_mod.pdb
    -dll
    -implib:example_mod.lib
    "/LIBPATH:C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mpi\intel64\bin\..\..\intel64\lib\debug"
    "/LIBPATH:C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mpi\intel64\bin\..\..\intel64\lib"
    impi.lib
    example_mod.obj
    example_mod@sub_smod.obj
       Creating library example_mod.lib and object example_mod.exp
    example_mod@sub_smod.obj : error LNK2019: unresolved external symbol FOR_RTL_INIT referenced in function sqr_2d_arr
    example_mod.dll : fatal error LNK1120: 1 unresolved externals
    ERROR in the compiling/linking [1120]

    Do you have any suggestions on how to call it from within the submodule's subroutine?

    0 Kudos
    Steve_Lionel
    Honored Contributor III
    589 Views

    The problem is that coarray program initialization happens within the main program, including the "mpirun". I don't think the implementation is yet up to having coarray code in a DLL (without a Fortran main program.)  FOR_RTL_INIT isn't going to help here. At a minimum you'd need to use /Qcoarray:single and use a separate MPI command to start the image (Python program) across the desired number of processes.

    I see there is a routine for_rtl_ICAF_COINIT which obviously does some sort of initialization, but it is not documented for users calling it directly.

    0 Kudos
    DataScientist
    Valued Contributor I
    589 Views

    Thanks Steve. The /Qcoarray flag does not seems to have any effects on the output. I tried SINGLE, SHARED and DISTRIBUTED. The fortran processes are all generated properly, and so long as there is no message passing between the images, it will run fine. But that severly limits the usability of coarrays in mixed language programming (basically there is no coarray parallelization under this scenario). A large body of the software's users live in other language islands, among them Python. It would be good if the Intel team could come up with a solution or a set of guildelines about coarray mixed language programming, perhaps as a blog post. Anyways, we look very much forward to seeing more comprehensive implementation and support of Coarray Fortran by Intel.

     

    0 Kudos
    jimdempseyatthecove
    Honored Contributor III
    589 Views

    A King,

    Does the Python program (iow the Python code) need to be distributed?

    IOW does the distribution requirements only belong to the Fortran code?

    If so, then consider writing a wrapper Fortran DLL that performs a SYSTEM or SYSTEMQQ to run your mpirun of an executable that performs the work currently in you DLL.

    Jim Dempsey

    0 Kudos
    Steve_Lionel
    Honored Contributor III
    589 Views

    Use EXECUTE_COMMAND_LINE rather than SYSTEM/SYSTEMQQ.

    0 Kudos
    Arjen_Markus
    Honored Contributor I
    589 Views

    Just a thought: might an alternative be to create a Fortran main program that invokes the Python script via forpy, instead of directly via a Python interpreter program. That way the coarray environment is taken care of and there is only one Python interpreter running.

    0 Kudos
    DataScientist
    Valued Contributor I
    589 Views

    Thank you, Jim, Steve, Arjen. Your work-around solutions theoretically work, but I am afraid such an approach would severely limit the development of programs in Python. In essence, I am looking for a way to bridge Python's MPI4PY package and Intel's coarray Fortran implementation (which is implicit MPI).

    I just stumbled upon this ifort option "-nofor-main", and I wonder if this could be of any help at the link time: https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-nofor-main

    I could add this flag and rerun my tests to get an answer. But I thought there could be more to this flag that I could learn from you than by simply rerunning my own tests to get an answer. Does "nofor-main" have any effects on Coarray functionality and initializations that are reuiqred and implicitly occur in the presence of a main Fortran program?

    0 Kudos
    Steve_Lionel
    Honored Contributor III
    589 Views

    -nofor-main has no effect on Windows and will not help you.As I wrote above, coarray code currently cannot be used when the main program is not Fortran. This may be something available in the future when the Teams feature of F2018 is fully supported.

    0 Kudos
    Reply