Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

possible race condition when passing internal procedure in parallel code?

david_g_2
Beginner
678 Views

 

Hi all,

We have been experiencing a strange issue when passing internal procedures to external subroutines in parallel code.  Our code is structured as a c# executable which calls into a fortan dll.  The fortran calls are run in parallel using the .NET task parallel library.   The entry point into the fortran library is a module containing an export (run) that contains various internal procedures that are passed to some legacy code (an ODE solver).  

What we have noticed is that every once in a while some of the (local) parameters to run() are not correct when the ODE solver invokes the internal procedure.  

A simplified example which exhibits similar behavior:

! multitest.f90
 module multitest
    implicit none
    contains
    subroutine run(k_in)
    !DEC$ attributes dllexport::run
    !DEC$ ATTRIBUTES ALIAS:'run' :: run
    !DEC$ ATTRIBUTES VALUE :: k_in
    integer, intent(in) :: k_in
    integer :: kProc
    kProc = k_in
    ! check the input is correct
    if (kProc > 1000) then
        print *, 'Error: k too big', kProc, k_in
    end if
    !Normally host association would be used to obtain additional data required by f(x)
    !Passing kProc for test purposes
    call ode(internal_proc, kProc)
    contains
    subroutine internal_proc(x, kPassed)
    real, intent(in) :: x
    integer, intent(in) :: kPassed
    ! check that the passed parameter is valid
    if (kProc /= kPassed) then
        print *, 'ERROR!', ' kProc = ', kProc, 'kPassed = ', kPassed
    end if
    end  subroutine internal_proc
    end subroutine run
    end module multitest
!ode.f90
subroutine ode(f, k)
    implicit none
   interface
      subroutine f(x, k)
         real, intent(in) :: x
         integer, intent(in) :: k
      end subroutine
   end interface
   integer :: k
   real :: x
   integer :: i 
   x = 0
   ! uncomment this block to slow things down
   !do i = 1, 1000000
   !    x = x*x + i
   !end do
   call f(x, k)
end subroutine

The EXE code is 

class Program
    {
        [DllImport("multitest.dll", CallingConvention = CallingConvention.Cdecl)]
        public static extern void run(int k);

        static void Main(string[] args)
        {
            var rnd = new Random();
            int nRuns = 1000;
            int maxK = 1000;
            
            Parallel.For(0, nRuns, i =>
            {
                int k = rnd.Next(maxK);
                run(k);
            });

        }

When I run this on a multicore system, the kproc == kpassed test fails on the order of one time in a thousand:  one example output of the code above is 

 ERROR! kProc =            0 kPassed =          436
 ERROR! kProc =   1936961198 kPassed =          194
 ERROR! kProc =          967 kPassed =          100

Note that the c# code limits the input to < 1000, which is verified in the fortan code.    

If I run this code in serial mode, and/or uncomment  the do loop in ODE, then the code runs without any errors.

I have seen a couple of examples in this forum that had somewhat similar problems, such as https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/623339  and https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/543705  (which I used in my example).  I have also read the Dr. Fortran 'think thank thunk' article https://software.intel.com/en-us/blogs/2009/09/02/doctor-fortran-in-think-thank-thunk  which I think states that the local run() variables are placed on the stack and then the ODE call uses them.  

Given the sporadic nature of the problem, it sorta 'smells' like there might be some kind of race condition going on here where the local vars are somehow getting changed before being passed into the ODE solver - however, I am in no way an expert in these matters.  

Does anyone have any idea of what could be causing this issue?  Also, any idea of how to fix (or at least test) this?

Thanks a lot, 

Dave

p.s. I am using intel parallel studio xe 2017 update 5 + visual studio 2017 + windows 7 pro.  Code is compiled as 32 bit; the only compiler option I changed was to enable recursive routines (generate reentrant code = default). 

 

 

 

0 Kudos
9 Replies
jimdempseyatthecove
Honored Contributor III
678 Views

It looks as if it is behaving like kProc is save

As silly as this may seem...

Try compiling the Fortran code with -qopenmp (even though your application is not OpenMP).

If this works, then we can try to figure out why.

Jim Dempsey

 

0 Kudos
Steve_Lionel
Honored Contributor III
678 Views

The compiler doesn't know your code is parallel. You should add RECURSIVE before the SUBROUTINE or FUNCTION statement.

0 Kudos
jimdempseyatthecove
Honored Contributor III
678 Views

FWIW

I've had success with C# (multi-threaded) <-> (unmanaged) C++ DLL <-> (unmanaged) Fortran DLL (compiled with -qopenmp) multi-threaded OpenMP

I haven't tried  C# (multi-threaded) <-> (unmanaged) Fortran DLL (compiled with -qopenmp)

BTW - compiling with -qopenmp applies RECURSIVE to all subroutines and functions, so should -recursive, and -auto.

Steve, I think the document is not correct to state: "It does not affect variables that have the SAVE attribute or ALLOCATABLE attribute..." for the allocatable because it affects the placement of the array descriptor. -auto places the array descriptor on stack (assuming SAVE not also used), -noauto placed the array descriptor in SAVE area (assuming no -qopenmp nor -recursive nore RECURSIVE declaration on SUBROUTINE).

David, this is a heads up

When using multiple threads in C# .AND. calling Fortran DLL (or C++) that also uses OpenMP threading (-qopenmp without !$OMP directives does not use OpenMP), Each C# thread (unique thread id) will instantiate an OpenMP thread pool resulting in oversubscription. C# has the "nasty" characteristic with respect to OpenMP in that it does not reuse the C# threads once spawned (it spawns new ones as needed). This results in an apparent memory leak, due to the OpenMP runtime system keeping each calling thread's thread pool available for reuse (which may not happen).

Jim Dempsey

0 Kudos
Steve_Lionel
Honored Contributor III
678 Views

Jim, I agree with you on the documentation wording.

0 Kudos
david_g_2
Beginner
678 Views

Hi Steve and Jim,

Thank you very much for your suggestions, but unfortunately I am still getting the same results after trying them.  I should have emphasized that I enabled recursive functions (/recursive) in the project properties, which I thought made all functions recursive - in any case, I explicitly added the recursive keyword to all functions and subroutines, and also enabled openmp as Jim suggested, but as mentioned I'm still getting the same behavior in both debug and release builds.

One other thing - the 'bad' parameter test seems to fail about 1 time in a thousand - e.g. about 1 or 2 failures using the code above.  About 1 time in 100 or so, I will get an access violation:

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

at driver.Program.run(Int32 k)

 

Any other suggestions?  This has been causing us a ton of misery...

 

Thanks again,

Dave

 

 

0 Kudos
Steve_Lionel
Honored Contributor III
678 Views

I recall, a few years ago, sending a report to the developers that the "thunk" for an internal procedure call was always being placed in static storage, even when the procedure was recursive. I am pretty sure that got fixed a while ago. I examined the assembly code from your first source - I see a call to __intel_alloc_bpv, (BPV = Bound Procedure Value, what I called a thunk above), with the resulting address saved on the stack. What this ought to do is provide a different BPV for each instance of the calling procedure. There could still be a bug here. I'd suggest submitting a reproducer to Intel support.

0 Kudos
jimdempseyatthecove
Honored Contributor III
678 Views

Try:

...
 module multitest
    implicit none
    contains
    subroutine run(k_in) BIND(C, NAME='run')
    !DEC$ attributes dllexport::run
    !DEC$ ATTRIBUTES VALUE :: k_in
    integer, intent(in) :: k_in
    integer :: kProc
...

Jim Dempsey

0 Kudos
david_g_2
Beginner
678 Views

Steve,

I noticed something similar when I tried running the code through the intel inspector / memory analysis - it reports a 'memory not deallocated' error for run(), and highlights the alloc_bpv call you mentioned.  

I'll try contacting support - thanks again

0 Kudos
jimdempseyatthecove
Honored Contributor III
678 Views

David,

RE: Steve's>>You should add RECURSIVE before the SUBROUTINE or FUNCTION statement.

This applies to all Fortran compiled object files, not just the entry point subroutine/function. This interoperable issue (multi-threaded Fortran code) in how the default behavior is for procedure scoped variables. IVF documentation:

By default, the compiler allocates local scalar variables on the stack. Other non-allocatable variables of non-recursive subprograms are allocated in static storage by default. This default can be changed through compiler options. Appropriate use of the SAVE attribute may be required if your program assumes that local variables retain their definition across subprogram calls.

In particular, arrays and array descriptors (e.g. for allocatable arrays) default to SAVE.

Jim Dempsey

0 Kudos
Reply