Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
83 Views

unlimited polymorphic variables and multi-threading results in crash

Hi,

the following code has a (very trivial) implementation of an auto-growing array (called "list"), where each instance of it can hold a different type. It works well, as long you don't use it multi-threaded:

If you compile the code with /Qopenmp it will simply crash!

The reason is (and I couldn't believe when I saw it in the assembler code) that inside of the append-subroutine at run-time the currently used type is switched and written to GLOBAL STATIC memory. This is simply insane, and I cannot find any reason, why this information is not written onto the stack or at least in thread-local storage when /threads is specified.

Tobias

    module test_mod
    implicit none
    
    type list
        class(*), allocatable :: c(:)
        integer :: num_nodes
    contains
    procedure :: append
    end type 

    contains
    
    subroutine append(this,element)
    class(list) :: this
    class(*), intent(in) :: element
    class(*), allocatable :: tmp(:)
    integer :: l

    if (.not.allocated(this%c)) then
        allocate(this%c(1),source=element)
        this%c(1) = element
        this%num_nodes = 1
        return
    else
        l = size(this%c,1)
        allocate(tmp(l+1),source=element)
        tmp(1:l) = this%c(1:l)
        tmp(l+1) = element
        deallocate(this%c)
        call move_alloc(tmp,this%c)
        this%num_nodes = this%num_nodes + 1
    endif
    end subroutine

    
    
    
    subroutine test()
    type(list) :: lines
    type(list) :: ilines
    character(256):: line
    integer:: err,cnt,i,file_unit

    !Init:
    cnt = 1
 
    DO i=1,100
        line = 'hello'
            call append(lines,line)
            call append(ilines,i)

    end do

    end subroutine
    end module
    
    program Console1
    use test_mod
    implicit none
    integer loop_t, nr_temps

    nr_temps = 20000
    
    !$omp parallel do
    do loop_t = 1,nr_temps
        call test()
    end do
    !$omp end parallel do


    end program Console1

 

0 Kudos
21 Replies
Highlighted
Black Belt Retired Employee
73 Views

From the OpenMP spec:

This OpenMP API specification refers to ISO/IEC 1539-1:2010 as Fortran 2008. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.

Submodules

Coarrays

DO CONCURRENT

Allocatable components of recursive type

Pointer initialization

Value attribute is permitted for any nonallocatable nonpointer nonarray

Simply contiguous arrays rank remapping to rank>1 target

Polymorphic assignment

Accessing real and imaginary parts

Pointer function reference is a variable

Recursive I/O

The BLOCK construct

EXIT statement (to terminate a non-DO construct)

ERROR STOP

Internal procedure as an actual argument

Generic resolution by procedureness

Generic resolution by pointer vs. allocatable

Impure elemental procedures

0 Kudos
Highlighted
Black Belt Retired Employee
73 Views

That said, I think what you have found would likely be considered a bug and I recommend that you report this to Intel using the Online Service Center.

0 Kudos
Highlighted
Beginner
73 Views

Hello Steve,

thanks for your reply, but this is not an OpenMP problem, I just used OpenMP to show the problem.

The problem is that the subroutine "append" is not thread-safe, even though /threads is specified. So, this code is invalid in any multi-threaded context (e.g. when it is inside a dll that is used from some multi-threaded C++ code). Further, I couldn't find any comment in the intel-fortran-docs about unlimited polymorphic variables may resulting in non thread-safe code.

0 Kudos
Highlighted
Black Belt Retired Employee
73 Views

Can you highlight for me the assembly that you think is at issue? I'm having difficulty identifying it.

In the absence of /Qopenmp, the routines are not RECURSIVE and therefore not thread-safe. (That will change with Fortran 2018, which makes routines recursive by default.)  /threads is an obsolete option that selects between thread-safe and non-thread-safe run-time libraries; the latter no longer exists so /threads is a no-op. You probably want /recursive.

0 Kudos
Highlighted
Beginner
73 Views

This is from the assembler x64/Release. The problematic lines are all the 

mov qword ptr [__scrt_ucrt_dll_is_in_use+228h (07FF7583F9768h)]...

which write to the global rw-memory of the process, as you can see from address of the exe's image:

Console1.exe    Console1.exe    C:\Users\k8705963\source\repos\Console1\x64\Release\Console1.exe    N/A    N/A    Symbols loaded.    C:\Users\k8705963\source\repos\Console1\x64\Release\Console1.pdb    1        2/9/2020 6:07 PM    00007FF7583F0000-00007FF7583FF000    [49976] Console1.exe        
 

the code results from line 28 in Console1.f90

tmp(l+1) = element

(I've attached the project). (In x86 builds the problem also exists, but assembler doesn't show it so nice.)

Further down in the assembly, there are again writes to global memory.

Tobias

 

00007FF7583F16FA  mov         qword ptr [r10+20h],0  
00007FF7583F1702  call        for_alloc_assign_v2 (07FF7583F45A4h)  
        tmp(l+1) = element
00007FF7583F1707  mov         rbx,qword ptr [rsp+168h]  
00007FF7583F170F  lea         rcx,[__scrt_ucrt_dll_is_in_use+220h (07FF7583F9760h)]  
00007FF7583F1716  imul        r12,rbx  
00007FF7583F171A  mov         qword ptr [r13+18h],rsi  
00007FF7583F171E  mov         rsi,qword ptr [rsp+1A0h]  
00007FF7583F1726  imul        rsi,rbx  
00007FF7583F172A  mov         rdx,qword ptr [rsp+160h]  
00007FF7583F1732  add         rdx,r12  
00007FF7583F1735  add         rdx,rbx  
00007FF7583F1738  mov         rax,qword ptr [rsp+1A8h]  
00007FF7583F1740  sub         rdx,rsi  
00007FF7583F1743  mov         rbp,qword ptr [rsp+1B0h]  
00007FF7583F174B  mov         qword ptr [__scrt_ucrt_dll_is_in_use+228h (07FF7583F9768h)],rbx  
00007FF7583F1752  mov         qword ptr [__scrt_ucrt_dll_is_in_use+220h (07FF7583F9760h)],r14  
00007FF7583F1759  mov         qword ptr [__scrt_ucrt_dll_is_in_use+250h (07FF7583F9790h)],rax  
00007FF7583F1760  mov         qword ptr [__scrt_ucrt_dll_is_in_use+258h (07FF7583F9798h)],rbp  
00007FF7583F1767  mov         qword ptr [__scrt_ucrt_dll_is_in_use+270h (07FF7583F97B0h)],r14  
00007FF7583F176E  mov         qword ptr [__scrt_ucrt_dll_is_in_use+280h (07FF7583F97C0h)],r14  
00007FF7583F1775  mov         qword ptr [__scrt_ucrt_dll_is_in_use+278h (07FF7583F97B8h)],r14  
00007FF7583F177C  mov         qword ptr [__scrt_ucrt_dll_is_in_use+268h (07FF7583F97A8h)],r14  
00007FF7583F1783  mov         qword ptr [__scrt_ucrt_dll_is_in_use+288h (07FF7583F97C8h)],r14  
00007FF7583F178A  mov         qword ptr [__scrt_ucrt_dll_is_in_use+290h (07FF7583F97D0h)],r14  
00007FF7583F1791  mov         qword ptr [__scrt_ucrt_dll_is_in_use+238h (07FF7583F9778h)],443h  
00007FF7583F179C  mov         qword ptr [__scrt_ucrt_dll_is_in_use+240h (07FF7583F9780h)],r14  
00007FF7583F17A3  mov         qword ptr [__scrt_ucrt_dll_is_in_use+230h (07FF7583F9770h)],r14  
00007FF7583F17AA  call        for_finalize (07FF7583F459Eh)  
00007FF7583F17AF  mov         r9,qword ptr [rsp+160h]  
00007FF7583F17B7  mov         rax,rsp  
00007FF7583F17BA  add         r9,r12  
00007FF7583F17BD  mov         rcx,r13  
00007FF7583F17C0  mov         r12,qword ptr [rsp+1A0h]  
00007FF7583F17C8  add         r9,rbx  
00007FF7583F17CB  imul        r12,rbx  
00007FF7583F17CF  sub         r9,r12  
00007FF7583F17D2  mov         rdx,rdi  
00007FF7583F17D5  lea         r8,[__scrt_ucrt_dll_is_in_use+220h (07FF7583F9760h)]  
00007FF7583F17DC  mov         qword ptr [rax+20h],0  
00007FF7583F17E4  call        for_alloc_assign_v2 (07FF7583F45A4h)  
        deallocate(this%c)
00007FF7583F17E9  mov         rcx,qword ptr [r15]  
00007FF7583F17EC  mov         rdx,qword ptr [rcx]  
00007FF7583F17EF  or          qword ptr [rcx+18h],400h  

 

0 Kudos
Highlighted
Beginner
73 Views

I forgot to mention, the version I'm using is

Intel® Parallel Studio XE 2019 Update 5 Composer Edition for Fortran Windows* Integration for Microsoft Visual Studio* 2019, Version 19.0.0052.16

 

0 Kudos
Highlighted
Black Belt Retired Employee
73 Views

Ok, I see it now. I am not sure what that code is doing - looks to be setting up some sort of data structure, perhaps for later use by a call to for_finalize, but I am not sure. Regardless, I agree it should not be using static storage for that. Please submit a report through the OSC.

0 Kudos
Highlighted
73 Views

Hmm, quite interesting bug.

The call to for_finalize is shown to be using static storage for context information, yet during testing at Intel (IMHO) this bug did not exhibit an issue due to the fact (presumption on my part) that the test procedures were more (or less) well constructed parallel programming examples. By this, I mean that your allocate subroutine itself is not thread-safe from the source code perspective... however, in your test program usage, the specific list objects are not used by multiple threads (created, used, and destroyed within each thread context). In testing at Intel, it was likely that similar typed objects which call for_finalize were located in the master thread context.

Good catch on presenting a simple test program that illustrates the bug. You will need to combine your report in #6 with #1. Otherwise the software support person may assume the issue is related to append not being coded in a thread-safe manner. Then you will have to explain that each instance of object list is used thread-independently.

Jim Dempsey

0 Kudos
Highlighted
73 Views

Tobias,

Out of curiosity, and perhaps as an acceptable work-around for others, what happens if you add to your list type a FINAL subroutine? It doesn't have to do much, in this case to delete C when C is allocated. IIF this omits the CALL to for_finalize, then the problem can be avoided with minimal code changes.

Jim Dempsey

0 Kudos
Highlighted
Beginner
73 Views

Jim,

thanks for your comments. I've tried your suggestion with the FINAL subroutine, but the program still crashes when run multi-threaded.

Tobias

0 Kudos
Highlighted
73 Views

FWIW, this would be the fix:

subroutine append_single(this,element)
  class(list) :: this
  class(*), intent(in) :: element
  class(*), allocatable :: tmp(:)
  integer :: l

  if (.not.allocated(this%c)) then
    allocate(this%c(1),source=element)
    this%c(1) = element
    this%num_nodes = 1
    return
  else
    l = size(this%c,1)
    allocate(tmp(l+1),source=element)
    tmp(1:l) = this%c(1:l)
    tmp(l+1) = element
    deallocate(this%c)
    call move_alloc(tmp,this%c)
    this%num_nodes = this%num_nodes + 1
  endif
end subroutine append_single

subroutine append(this,element)
  class(list) :: this
  class(*), intent(in) :: element
  !$omp critical(for_finalize_critical) ! same name to be used on all procedures with finalize
  call append_single(this,element)
  !$omp end critical(for_finalize_critical)
end subroutine append

Jim Dempsey

0 Kudos
Highlighted
Beginner
73 Views

Jim,

thanks for the code. This will fix the problem for OpenMP-multithreading, but unfortunately not for general multi-threading. The library, where this error occurs, is used from all kind of top-level languages, e.g. C, C++, Python or Excel-VBA, but AFAIK Fortran doesn't have any general (not OpenMP) multi-threading primitives.

For now, that code will be replaced by a specific list-implementation for each type, which won't need a critical-section, as this would be a huge performance-bottleneck. As a side-effect, that is also type-safer, since you cannot call the append of an integer-list implementation with a character(256). (In the "unlimited polymorphic" above code that would compile without error but result in a run-time crash.)

0 Kudos
Highlighted
73 Views

Out of curiosity, in looking at your sample code, and ignore the fact that the following might not be how you intend to use the code, consider:

        call append(lines,line)
        call append(lines,i)     ! same list object, different type


What do you expect append to do???

Jim Dempsey

0 Kudos
Highlighted
Valued Contributor III
73 Views

Readers will find it useful if OP can illustrate the steps taken and make an attempt to show what is meant by a crash - is it some exception from the run-time library such as "forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types".

If it's the severe exception 189, it points either to faulty code or as indicated in Quote #2 some "unspecified behavior" that is forewarned in OpenMP documentation. 

Any discussion appears meaningless until OP can show the details.

C:\Temp>type test_mod.f90
module test_mod

   implicit none

   type list
      class(*), allocatable :: c(:)
      integer :: num_nodes
   contains
      procedure :: append
   end type

contains

   subroutine append(this,element)
      class(list) :: this
      class(*), intent(in) :: element
      class(*), allocatable :: tmp(:)
      integer :: l

      if (.not.allocated(this%c)) then
         allocate(this%c(1),source=element)
         this%c(1) = element
         this%num_nodes = 1
         return
      else
         l = size(this%c,1)
         allocate(tmp(l+1),source=element)
         tmp(1:l) = this%c(1:l)
         tmp(l+1) = element
         deallocate(this%c)
         call move_alloc(tmp,this%c)
         this%num_nodes = this%num_nodes + 1
      endif
   end subroutine

   subroutine test()
   !DIR$ ATTRIBUTES DLLEXPORT :: test

      type(list) :: lines
      type(list) :: ilines
      character(256):: line
      integer:: err,cnt,i,file_unit

      !Init:
      cnt = 1

      DO i=1,100
         line = 'hello'
         call append(lines,line)
         call append(ilines,i)
      end do

   end subroutine

end module

C:\Temp>ifort /dll /standard-semantics /warn:all /stand:f18 /recursive /reentrancy:threaded /libs:static /threads test_mod.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.0.166 Build 20191121
Copyright (C) 1985-2019 Intel Corporation.  All rights reserved.

test_mod.f90(37): warning #7025: This directive is not standard F2018.
   !DIR$ ATTRIBUTES DLLEXPORT :: test
---------^
test_mod.f90(42): remark #7712: This variable has not been used.   [ERR]
      integer:: err,cnt,i,file_unit
----------------^
test_mod.f90(42): remark #7712: This variable has not been used.   [FILE_UNIT]
      integer:: err,cnt,i,file_unit
--------------------------^
Microsoft (R) Incremental Linker Version 14.24.28316.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:test_mod.dll
-dll
-implib:test_mod.lib
test_mod.obj
   Creating library test_mod.lib and object test_mod.exp

C:\Temp>type Console1.f90
program Console1

   use test_mod
   implicit none
   integer loop_t, nr_temps

   nr_temps = 20000

   !$omp parallel do
   do loop_t = 1,nr_temps
      call test()
   end do
   !$omp end parallel do

end program Console1


C:\Temp>ifort /c /Qopenmp /standard-semantics Console1.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.0.166 Build 20191121
Copyright (C) 1985-2019 Intel Corporation.  All rights reserved.


C:\Temp>link Console1.obj test_mod.lib /subsystem:console /out:Console1.exe
Microsoft (R) Incremental Linker Version 14.24.28316.0
Copyright (C) Microsoft Corporation.  All rights reserved.


C:\Temp>Console1.exe
forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types
forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types
forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types
forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types

C:\Temp>

 

0 Kudos
Highlighted
73 Views

FF,

Do you think the error message is due to deallocate(this%c), and thus making the class(*) C(:) undefined. IOW when used as rhs it now has no class type, and thus differs from lhs.

In your test setup, what happens when you comment out the deallocate(this%c)? IOW letting move_alloc perform the deallocation (possibly after the compiler type checking code completes).

Jim Dempsey

0 Kudos
Highlighted
Beginner
73 Views

Please let me make this clear:

The original code runs absolutely fine and is bug-free as long as it runs single-threaded.

As pointed out in #6, the problem is that prior to the call of for_finalize the global (!) state of the module is changed, which yields code that is inherently not thread-safe.

This is unexpected, since the /threads (multithreaded-runtime) and /Qauto (implicitly by /Qopenmp) are specified, and none of the used language features in the append-subroutine is documented to be not thread-save. (All variables are visible to just one thread.)

IMHO the compiler produces invalid code, otherwise the option /threads would be meaningless (at least when unlimited polymorphism is used).

Tobias

0 Kudos
Highlighted
Valued Contributor III
73 Views

Tobias Loew wrote:

Please let me make this clear:

The original code runs absolutely fine and is bug-free as long as it runs single-threaded. ..

Can you not follow the go-by in Quote #15 to better illustrate the problem so other readers who may be similarly affected can consider the workarounds/fixes suggested by Black Belts such as Jim?

Aspects such as "the compiler produces invalid code" can help Intel Fortran team with whenever they implement a resolution but they hardly suggest to other readers what to do if they face issues with polymorphic objects and their assignments etc. in an OpenMP context.  

0 Kudos
Highlighted
Valued Contributor III
73 Views

jimdempseyatthecove (Blackbelt) wrote:

FF,

Do you think the error message is due to deallocate(this%c), and thus making the class(*) C(:) undefined. IOW when used as rhs it now has no class type, and thus differs from lhs. ..

Possibly.  To me, the code in subprogram 'append' makes little sense given the use of unlimited polymorphic objects.  So I'm not going to look too much further.  It'll be useful if OP can follow up better.

0 Kudos
Highlighted
Beginner
73 Views

Hello FortranFan,

I've uploaded the project in #6, so anyone interested can download, compile and run it. Furthermore, you have shown the errors in #15. The code is also not faulty: if you run it single-threaded it works, also if you comment out line 49 or line 50 from the code in #1 it also works multi-threaded (OpenMP or not doesn't matter).

I've analyzed the assembly code and convinced Steve Lionel (a black-belt, in case that matters) that this is definitely not an OpenMP related problem but a problem in the compiler: dynamically switching the information of the runtime-type at fixed addresses in static global memory is simply a bad decision.

Tobias

 

 

0 Kudos