Calling Fortran from Fortran from C#

Erik_J_ · ‎02-05-2016

I have two cases of calling from C# program A a Fortran subroutine (in a DLL) FBn that itself calls a Fortran subroutine (in that same DLL) FCn.
So: A -> FB1 -> FC1

and the other case:

A -> FB2 -> FC2

FB1 and FB2 each need two DECLs to adjust to the C# calling convention

!DEC$ ATTRIBUTES DLLEXPORT ,STDCALL :: FBn
!DEC$ ATTRIBUTES ALIAS: 'FBn' :: FBn

No problem here. That call works fine. FBn is written by me in F95,
But FB1 calls subroutine FC1 (DSBGVX straight from LAPACK), and FB2 calls FC2 (DPBSVX also LAPACK)

Now I find, that in the case of FB1, FC1 also needs the two !DECL$, but FC2 does not.
I'd say that a call from Fortran to Fortran, you would never need to adjust the calling protocol.
But in one case you do.
Why would that be?
Is there an issue with Fortran versions? or...

Steven_L_Intel1 · ‎02-05-2016

If you change the calling convention of a procedure, you need to change it any place it is used, even within the same language.

In C# you can specify the C convention - that might be a better approach.

Erik_J_ · ‎02-05-2016

I see what you mean, but FB1 != FB2 and FC1 != FC2.

the calling convention of FB1 is the same as the calling convention of FB2, however, and those calls work fine.

The question is, why would FC1 demand a different calling convention than FC2?

Other "intra" LAPACK calls work right out of the box: no !DEC$s required.
And FBn calling FCn are just calls "within" that one Fortran DLL.
I'll see what the cdecl will give me.

Steven_L_Intel1 · ‎02-05-2016

On rereading your post, I find I don't fully understand your description.

The ATTRIBUTES STDCALL and ALIAS are needed for the call from C# to Fortran (the FB1 and FB2 routines, I think). The FC1 and FC2 routines should not need any thing special if they are being called only from FB1 and FB2. You haven't said what goes wrong.

Erik_J_ · ‎02-05-2016

You understand correctly.
Indeed it is/was my opinion that FC1 and FC2 do not need anything special.

But it does!
Let's see if I can get you some code:

===================================

here is the C# end:

internal static class NativeMethods
    {
        const string dllImport = @"solver\solver.dll";
        [DllImport(dllImport, CallingConvention = CallingConvention.StdCall, EntryPoint = "SOLVER", CharSet = CharSet.Unicode)]
        static public extern void SOLVER(int NEQ, int BANDW, int LOADCASES, double[,] SSS, double[,] LOADS, double[,] displ , int INFO);

[DllImport(dllImport, CallingConvention = CallingConvention.StdCall, EntryPoint = "EIGENPROBLEM", CharSet = CharSet.Unicode)]

static public extern void EIGENPROBLEM(int N, int KA,int KB, double[,] AB, double[,] BB, double VL, double VU, double[] W, double[,] Z, int[] IFAIL, int INFO);
}

=========================

routine SOLVER called from C#, calls DPBSVX:

SUBROUTINE SOLVER( NEQ,BANDW,LOADCASES,SSS,LOADS,DISPL,INFO )
      !DEC$ ATTRIBUTES DLLEXPORT , STDCALL :: SOLVER
      !DEC$ ATTRIBUTES ALIAS: 'SOLVER' :: SOLVER *     ..
      INTEGER NEQ,LOADCASES,BANDW,INFO
*     .. Array Arguments ..
      DOUBLE PRECISION   SSS(BANDW+1,*),LOADS(NEQ,LOADCASES),
     &DISPL(LOADCASES,*),RCOND
      DOUBLE PRECISION , ALLOCATABLE :: FERR(:), BERR(:),S(:),
     &AFB(:,:),WORK(:)
      INTEGER, ALLOCATABLE::IWORK(:)
      INTEGER STAT,ALLOC_ERR
*      EXTERNAL DPBSVX
*
      ALLOCATE (FERR(1:LOADCASES),S(1:NEQ),AFB(1:BANDW+1,1:NEQ),
     &WORK(1:3*NEQ), STAT=ALLOC_ERR)
      ALLOCATE (BERR(1:LOADCASES),IWORK(1:NEQ),STAT=ALLOC_ERR)
      CALL DPBSVX('N', 'L', NEQ, BANDW , LOADCASES, SSS, BANDW+1, AFB,
     & BANDW+1, 'N',S, LOADS, NEQ, DISPL, NEQ, RCOND, FERR, BERR, WORK,
     &IWORK, INFO)
      RETURN
      END

SUBROUTINE DPBSVX( FACT, UPLO, N, KD, NRHS, AB, LDAB, AFB, LDAFB,
     $                   EQUED, S, B, LDB, X, LDX, RCOND, FERR, BERR,
     $                   WORK, IWORK, INFO )
      !DEC$ ATTRIBUTES DLLEXPORT ,STDCALL :: DPBSVX
      !DEC$ ATTRIBUTES ALIAS: 'DPBSVX' :: DPBSVX
etc, leaving stuff away .
Getting at DPBSVX from C# via SOLVER only works if i use the !DEC$. I find that weird.

==========================

routine EIGENPROBLEM called from C#, calls DSBGVX:

      SUBROUTINE EIGENPROBLEM( N, KA,KB,AB,BB,VL,VU,W, Z,IFAIL,INFO )
      !DEC$ ATTRIBUTES DLLEXPORT ,STDCALL :: EIGENPROBLEM
      !DEC$ ATTRIBUTES ALIAS: 'EIGENPROBLEM' :: EIGENPROBLEM

      DOUBLE PRECISION Z(N,*),W(*),ABSTOL
      INTEGER ALLOC_ERR,STAT
      INTEGER N,INFO,KA,KB,IFAIL(*),M
      DOUBLE PRECISION VL,VU,AB(N,*),BB(N,*)
      DOUBLE PRECISION DLAMCH

      DOUBLE PRECISION , ALLOCATABLE :: WORK(:),Q(:,:)
      INTEGER, ALLOCATABLE :: IWORK(:)

      EXTERNAL DLAMCH, DSBGVX

      ALLOCATE(WORK(1:7*N),IWORK(1:5*N),Q(1:N,1:N),STAT=ALLOC_ERR)
      ABSTOL = DLAMCH( 'S' )

      CALL DSBGVX( 'V', 'V', 'L', N, KA, KB, AB, KA+1, BB,
     *KB+1, Q, N, VL, VU, 1,2, ABSTOL, M, W, Z,
     * N, WORK, IWORK, IFAIL, INFO )
      RETURN
      END

SUBROUTINE DSBGVX( JOBZ, RANGE, UPLO, N, KA, KB, AB, LDAB, BB,
$ LDBB, Q, LDQ, VL, VU, IL, IU, ABSTOL, M, W, Z,
$ LDZ, WORK, IWORK, IFAIL, INFO )
etc. I leave the rest of this routine away. Getting at DSBGVX from C# works fine this way.

==========================================

The problem lies with DPBSVX: why the attributes?

When I debug and watch the arguments comming into DPBSVX, I see nice integers and characters and some addresses.
As I should.
When I leave the two DEC$ out of DPBSVX , I see only addresses coming into DPBSVX and the program crashes: accessing illegal memory.

I do get the program to **work** correctly, but I find this very weird and I want to understand.

mecej4 · ‎02-05-2016

Eric J. wrote:
FB1 and FB2 each need two DECLs to adjust to the C# calling convention...

C# does not force you to use Stdcall, nor is that the default calling convention -- well, in the pure C# world, there is no "calling convention" in terms of registers, stack, etc.

There are several examples of calling the Lapack and BLAS routines in the MKL distribution, and most (none?) of them use only Cdecl. You can choose to call MKL routines using Stdcall, but only if there is some reason to do so and you have the optional Stdcall-compatible MKL libraries installed. Thus, the only reasons to use Stdcall at all is if (i) you wish to call some routines in libraries for which you do not have source code, and the library routines were compiled with Stdcall, or if (ii) there are library routines that take Stdcall callback functions as argument(s).

You appear to be using some Lapack routines that you compiled from sources, rather than linking with the same routines in MKL. If so, it may be the case that you compiled those parts of Lapack with the Stdcall convention, so having to use Stdcall is a self-imposed burden.

Erik_J_ · ‎02-06-2016

same problem with cdecl.

I compile the LAPACK routines myself, with Intel Fortran. I have a VS2015 dll project for those routines and my own LAPACK interface stubs (FBn).

No matter what calling convention I use (if used consequently), the LAPACK routines must all be accessed in the same way.
They are in the same DLL too.
In my opinion there cannot be a reason why DPBSVX would behave differently.
But still....

mecej4 · ‎02-06-2016

I prepared a set of test source codes to try and break the logjam. The test code is the C#+Fortran equivalent of the dgbsvx() example code at http://www.nag.com/numeric/fl/nagdoc_fl25/examples/source/f07bbfe.f90.html and the accompanying input data, with some modifications: (i) I use the Lapack95 interface and the Lapack routines in MKL, (ii) the matrix data is in the C# caller, rather than in a data file, and (iii) the Lapack95 routine is called through a Fortran wrapper that converts from row-major to column-major and back, (iv) the integer arguments that define the matrix sizes are passed from C# to Fortran by value.

You may consider avoiding Fortran altogether and use the LapackE routines in MKL, but then you would lose the benefit of using the short argument lists of the Lapack95 interfaces. You would also need to allocate and deallocate the temporary arrays that the LapackE interface to PBSVX would require you to pass.

The source code for the wrapper, f95pbsvx():

subroutine f95pbsvx(A,B,X,n,kd,nrhs) bind(C)
! A is in band format, row major
! B is full, row major
use lapack95, only: pbsvx
implicit none
integer, intent(in), value :: n,kd,nrhs
double precision :: A(n,kd+1),B(nrhs,n)
double precision, intent(out) :: X(nrhs,n)
double precision Aloc(kd+1,n),Bloc(n,nrhs),Xloc(n,nrhs)
!    convert to Column Major, solve the equations AX = B for X
Aloc=transpose(A); Bloc=transpose(B)
!
call pbsvx(Aloc, Bloc, Xloc)     ! <<=== note the short argument list
!
X=transpose(Xloc)
return
end subroutine

The DLL is built with the command

s:\LANG\MKL>ifort /Qmkl /LD f95pbsvx.f90 mkl_lapack95.lib /link /export:f95pbsvx

The C# program:

using System;
using System.Security;
using System.Runtime.InteropServices;
using mkl95;

public class test_pbsvx
{
    private test_pbsvx() {}
    public static void Main(string[] args) {
        /* Data initialization */
        int N=4, KD=1, NRHS=2;
        double[] A = new double[] { 0.0,2.68,-2.39,-2.22,  5.49,5.63,2.60,5.17};
        double[] B = new double[] {22.09,5.10,9.31,30.81,-5.24,-25.82,11.83,22.90};
        double[] X = new double[N*NRHS];
		Console.WriteLine("MKL F95_pbsvx example");
		Console.WriteLine();
		/* Print initial data */
        printMatrix("Matrix A",A,N,KD+1);
        printMatrix("Matrix B",B,N,NRHS);
        /* Computation */
        mklpb.pbsvx(A,B,X,N,KD,NRHS);
        /* Print the result */
        printMatrix("Resulting X",X,N,NRHS);
        Console.WriteLine("TEST PASSED");
		Console.WriteLine();
	}

    /** Print the matrix X assuming row-major order of elements. */
    private static void printMatrix(String prompt, double[] X, int I, int J) {
        Console.WriteLine(prompt);
        for (int i=0; i<I; i++) {
            for (int j=0; j<J; j++)
                Console.Write("\t" + String.Format("{0,10:0.00}",X[i*J+j]));
            Console.WriteLine();
        }
    }
}

namespace mkl95 
{
	/** Lapack95 wrappers */
	public sealed class mklpb {
		private mklpb() {}
		/** F95Lapack wrapper */
		public static void pbsvx(double[] A, double[] B, double[] X,
                       int n, int kd, int nrhs) {
			F95Native.f95pbsvx(A,B,X, n,kd,nrhs);
		}
	}

	/** F95 Lapack native declarations */
	[SuppressUnmanagedCodeSecurity]
	internal sealed class F95Native {
		private F95Native() {}
		[DllImport("f95pbsvx.dll", CallingConvention=CallingConvention.Cdecl,
			 ExactSpelling=true, SetLastError=false)]
		internal static extern void f95pbsvx(double[] A, double[] B, double[] X,
			 int n, int kd, int nrhs);
	}
}

Building the EXE from the C# file:

s:\lang\mkl>csc xpbsvx.cs /platform:x86

The program output:

MKL F95_pbsvx example

Matrix A
               0.0             2.7
              -2.4            -2.2
               5.5             5.6
               2.6             5.2
Matrix B
              22.1             5.1
               9.3            30.8
              -5.2           -25.8
              11.8            22.9
Resulting X
               5.0            -2.0
              -2.0             6.0
              -3.0            -1.0
               1.0             4.0
TEST PASSED

If you want x64 code instead of x86 code, proceed similarly, but use mkl_lapack95_lp64.lib to build the DLL, and specify /platform:x64 when building the EXE.

Caution: the example code is fine for processing small/medium size problems. For large matrices, the TRANSPOSE invocations in the wrapper and the allocation on the stack of local arrays for the transposes should probably be changed to improve efficiency.

TimP · ‎02-06-2016

Much effort has been devoted to making compilers optimize calls such as

call pbsvx(transpose(A), transpose(B), transpose(X))

by appropriate swaps in array descriptors without allocating new arrays.

Of course, in this case it is likely to end up with implied new arrays (on heap, if /heap_arrays is set). So mecej4 is right to caution that much time could be spent copying data around.