- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
the following code has a (very trivial) implementation of an auto-growing array (called "list"), where each instance of it can hold a different type. It works well, as long you don't use it multi-threaded:
If you compile the code with /Qopenmp it will simply crash!
The reason is (and I couldn't believe when I saw it in the assembler code) that inside of the append-subroutine at run-time the currently used type is switched and written to GLOBAL STATIC memory. This is simply insane, and I cannot find any reason, why this information is not written onto the stack or at least in thread-local storage when /threads is specified.
Tobias
module test_mod implicit none type list class(*), allocatable :: c(:) integer :: num_nodes contains procedure :: append end type contains subroutine append(this,element) class(list) :: this class(*), intent(in) :: element class(*), allocatable :: tmp(:) integer :: l if (.not.allocated(this%c)) then allocate(this%c(1),source=element) this%c(1) = element this%num_nodes = 1 return else l = size(this%c,1) allocate(tmp(l+1),source=element) tmp(1:l) = this%c(1:l) tmp(l+1) = element deallocate(this%c) call move_alloc(tmp,this%c) this%num_nodes = this%num_nodes + 1 endif end subroutine subroutine test() type(list) :: lines type(list) :: ilines character(256):: line integer:: err,cnt,i,file_unit !Init: cnt = 1 DO i=1,100 line = 'hello' call append(lines,line) call append(ilines,i) end do end subroutine end module program Console1 use test_mod implicit none integer loop_t, nr_temps nr_temps = 20000 !$omp parallel do do loop_t = 1,nr_temps call test() end do !$omp end parallel do end program Console1
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From the OpenMP spec:
This OpenMP API specification refers to ISO/IEC 1539-1:2010 as Fortran 2008. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
Submodules
Coarrays
DO CONCURRENT
Allocatable components of recursive type
Pointer initialization
Value attribute is permitted for any nonallocatable nonpointer nonarray
Simply contiguous arrays rank remapping to rank>1 target
Polymorphic assignment
Accessing real and imaginary parts
Pointer function reference is a variable
Recursive I/O
The BLOCK construct
EXIT statement (to terminate a non-DO construct)
ERROR STOP
Internal procedure as an actual argument
Generic resolution by procedureness
Generic resolution by pointer vs. allocatable
Impure elemental procedures
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That said, I think what you have found would likely be considered a bug and I recommend that you report this to Intel using the Online Service Center.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Steve,
thanks for your reply, but this is not an OpenMP problem, I just used OpenMP to show the problem.
The problem is that the subroutine "append" is not thread-safe, even though /threads is specified. So, this code is invalid in any multi-threaded context (e.g. when it is inside a dll that is used from some multi-threaded C++ code). Further, I couldn't find any comment in the intel-fortran-docs about unlimited polymorphic variables may resulting in non thread-safe code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you highlight for me the assembly that you think is at issue? I'm having difficulty identifying it.
In the absence of /Qopenmp, the routines are not RECURSIVE and therefore not thread-safe. (That will change with Fortran 2018, which makes routines recursive by default.) /threads is an obsolete option that selects between thread-safe and non-thread-safe run-time libraries; the latter no longer exists so /threads is a no-op. You probably want /recursive.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is from the assembler x64/Release. The problematic lines are all the
mov qword ptr [__scrt_ucrt_dll_is_in_use+228h (07FF7583F9768h)]...
which write to the global rw-memory of the process, as you can see from address of the exe's image:
Console1.exe Console1.exe C:\Users\k8705963\source\repos\Console1\x64\Release\Console1.exe N/A N/A Symbols loaded. C:\Users\k8705963\source\repos\Console1\x64\Release\Console1.pdb 1 2/9/2020 6:07 PM 00007FF7583F0000-00007FF7583FF000 [49976] Console1.exe
the code results from line 28 in Console1.f90
tmp(l+1) = element
(I've attached the project). (In x86 builds the problem also exists, but assembler doesn't show it so nice.)
Further down in the assembly, there are again writes to global memory.
Tobias
00007FF7583F16FA mov qword ptr [r10+20h],0 00007FF7583F1702 call for_alloc_assign_v2 (07FF7583F45A4h) tmp(l+1) = element 00007FF7583F1707 mov rbx,qword ptr [rsp+168h] 00007FF7583F170F lea rcx,[__scrt_ucrt_dll_is_in_use+220h (07FF7583F9760h)] 00007FF7583F1716 imul r12,rbx 00007FF7583F171A mov qword ptr [r13+18h],rsi 00007FF7583F171E mov rsi,qword ptr [rsp+1A0h] 00007FF7583F1726 imul rsi,rbx 00007FF7583F172A mov rdx,qword ptr [rsp+160h] 00007FF7583F1732 add rdx,r12 00007FF7583F1735 add rdx,rbx 00007FF7583F1738 mov rax,qword ptr [rsp+1A8h] 00007FF7583F1740 sub rdx,rsi 00007FF7583F1743 mov rbp,qword ptr [rsp+1B0h] 00007FF7583F174B mov qword ptr [__scrt_ucrt_dll_is_in_use+228h (07FF7583F9768h)],rbx 00007FF7583F1752 mov qword ptr [__scrt_ucrt_dll_is_in_use+220h (07FF7583F9760h)],r14 00007FF7583F1759 mov qword ptr [__scrt_ucrt_dll_is_in_use+250h (07FF7583F9790h)],rax 00007FF7583F1760 mov qword ptr [__scrt_ucrt_dll_is_in_use+258h (07FF7583F9798h)],rbp 00007FF7583F1767 mov qword ptr [__scrt_ucrt_dll_is_in_use+270h (07FF7583F97B0h)],r14 00007FF7583F176E mov qword ptr [__scrt_ucrt_dll_is_in_use+280h (07FF7583F97C0h)],r14 00007FF7583F1775 mov qword ptr [__scrt_ucrt_dll_is_in_use+278h (07FF7583F97B8h)],r14 00007FF7583F177C mov qword ptr [__scrt_ucrt_dll_is_in_use+268h (07FF7583F97A8h)],r14 00007FF7583F1783 mov qword ptr [__scrt_ucrt_dll_is_in_use+288h (07FF7583F97C8h)],r14 00007FF7583F178A mov qword ptr [__scrt_ucrt_dll_is_in_use+290h (07FF7583F97D0h)],r14 00007FF7583F1791 mov qword ptr [__scrt_ucrt_dll_is_in_use+238h (07FF7583F9778h)],443h 00007FF7583F179C mov qword ptr [__scrt_ucrt_dll_is_in_use+240h (07FF7583F9780h)],r14 00007FF7583F17A3 mov qword ptr [__scrt_ucrt_dll_is_in_use+230h (07FF7583F9770h)],r14 00007FF7583F17AA call for_finalize (07FF7583F459Eh) 00007FF7583F17AF mov r9,qword ptr [rsp+160h] 00007FF7583F17B7 mov rax,rsp 00007FF7583F17BA add r9,r12 00007FF7583F17BD mov rcx,r13 00007FF7583F17C0 mov r12,qword ptr [rsp+1A0h] 00007FF7583F17C8 add r9,rbx 00007FF7583F17CB imul r12,rbx 00007FF7583F17CF sub r9,r12 00007FF7583F17D2 mov rdx,rdi 00007FF7583F17D5 lea r8,[__scrt_ucrt_dll_is_in_use+220h (07FF7583F9760h)] 00007FF7583F17DC mov qword ptr [rax+20h],0 00007FF7583F17E4 call for_alloc_assign_v2 (07FF7583F45A4h) deallocate(this%c) 00007FF7583F17E9 mov rcx,qword ptr [r15] 00007FF7583F17EC mov rdx,qword ptr [rcx] 00007FF7583F17EF or qword ptr [rcx+18h],400h
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I forgot to mention, the version I'm using is
Intel® Parallel Studio XE 2019 Update 5 Composer Edition for Fortran Windows* Integration for Microsoft Visual Studio* 2019, Version 19.0.0052.16
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, I see it now. I am not sure what that code is doing - looks to be setting up some sort of data structure, perhaps for later use by a call to for_finalize, but I am not sure. Regardless, I agree it should not be using static storage for that. Please submit a report through the OSC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hmm, quite interesting bug.
The call to for_finalize is shown to be using static storage for context information, yet during testing at Intel (IMHO) this bug did not exhibit an issue due to the fact (presumption on my part) that the test procedures were more (or less) well constructed parallel programming examples. By this, I mean that your allocate subroutine itself is not thread-safe from the source code perspective... however, in your test program usage, the specific list objects are not used by multiple threads (created, used, and destroyed within each thread context). In testing at Intel, it was likely that similar typed objects which call for_finalize were located in the master thread context.
Good catch on presenting a simple test program that illustrates the bug. You will need to combine your report in #6 with #1. Otherwise the software support person may assume the issue is related to append not being coded in a thread-safe manner. Then you will have to explain that each instance of object list is used thread-independently.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tobias,
Out of curiosity, and perhaps as an acceptable work-around for others, what happens if you add to your list type a FINAL subroutine? It doesn't have to do much, in this case to delete C when C is allocated. IIF this omits the CALL to for_finalize, then the problem can be avoided with minimal code changes.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
thanks for your comments. I've tried your suggestion with the FINAL subroutine, but the program still crashes when run multi-threaded.
Tobias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW, this would be the fix:
subroutine append_single(this,element) class(list) :: this class(*), intent(in) :: element class(*), allocatable :: tmp(:) integer :: l if (.not.allocated(this%c)) then allocate(this%c(1),source=element) this%c(1) = element this%num_nodes = 1 return else l = size(this%c,1) allocate(tmp(l+1),source=element) tmp(1:l) = this%c(1:l) tmp(l+1) = element deallocate(this%c) call move_alloc(tmp,this%c) this%num_nodes = this%num_nodes + 1 endif end subroutine append_single subroutine append(this,element) class(list) :: this class(*), intent(in) :: element !$omp critical(for_finalize_critical) ! same name to be used on all procedures with finalize call append_single(this,element) !$omp end critical(for_finalize_critical) end subroutine append
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
thanks for the code. This will fix the problem for OpenMP-multithreading, but unfortunately not for general multi-threading. The library, where this error occurs, is used from all kind of top-level languages, e.g. C, C++, Python or Excel-VBA, but AFAIK Fortran doesn't have any general (not OpenMP) multi-threading primitives.
For now, that code will be replaced by a specific list-implementation for each type, which won't need a critical-section, as this would be a huge performance-bottleneck. As a side-effect, that is also type-safer, since you cannot call the append of an integer-list implementation with a character(256). (In the "unlimited polymorphic" above code that would compile without error but result in a run-time crash.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Out of curiosity, in looking at your sample code, and ignore the fact that the following might not be how you intend to use the code, consider:
call append(lines,line) call append(lines,i) ! same list object, different type
What do you expect append to do???
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Readers will find it useful if OP can illustrate the steps taken and make an attempt to show what is meant by a crash - is it some exception from the run-time library such as "forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types".
If it's the severe exception 189, it points either to faulty code or as indicated in Quote #2 some "unspecified behavior" that is forewarned in OpenMP documentation.
Any discussion appears meaningless until OP can show the details.
C:\Temp>type test_mod.f90 module test_mod implicit none type list class(*), allocatable :: c(:) integer :: num_nodes contains procedure :: append end type contains subroutine append(this,element) class(list) :: this class(*), intent(in) :: element class(*), allocatable :: tmp(:) integer :: l if (.not.allocated(this%c)) then allocate(this%c(1),source=element) this%c(1) = element this%num_nodes = 1 return else l = size(this%c,1) allocate(tmp(l+1),source=element) tmp(1:l) = this%c(1:l) tmp(l+1) = element deallocate(this%c) call move_alloc(tmp,this%c) this%num_nodes = this%num_nodes + 1 endif end subroutine subroutine test() !DIR$ ATTRIBUTES DLLEXPORT :: test type(list) :: lines type(list) :: ilines character(256):: line integer:: err,cnt,i,file_unit !Init: cnt = 1 DO i=1,100 line = 'hello' call append(lines,line) call append(ilines,i) end do end subroutine end module C:\Temp>ifort /dll /standard-semantics /warn:all /stand:f18 /recursive /reentrancy:threaded /libs:static /threads test_mod.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.0.166 Build 20191121 Copyright (C) 1985-2019 Intel Corporation. All rights reserved. test_mod.f90(37): warning #7025: This directive is not standard F2018. !DIR$ ATTRIBUTES DLLEXPORT :: test ---------^ test_mod.f90(42): remark #7712: This variable has not been used. [ERR] integer:: err,cnt,i,file_unit ----------------^ test_mod.f90(42): remark #7712: This variable has not been used. [FILE_UNIT] integer:: err,cnt,i,file_unit --------------------------^ Microsoft (R) Incremental Linker Version 14.24.28316.0 Copyright (C) Microsoft Corporation. All rights reserved. -out:test_mod.dll -dll -implib:test_mod.lib test_mod.obj Creating library test_mod.lib and object test_mod.exp C:\Temp>type Console1.f90 program Console1 use test_mod implicit none integer loop_t, nr_temps nr_temps = 20000 !$omp parallel do do loop_t = 1,nr_temps call test() end do !$omp end parallel do end program Console1 C:\Temp>ifort /c /Qopenmp /standard-semantics Console1.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.0.166 Build 20191121 Copyright (C) 1985-2019 Intel Corporation. All rights reserved. C:\Temp>link Console1.obj test_mod.lib /subsystem:console /out:Console1.exe Microsoft (R) Incremental Linker Version 14.24.28316.0 Copyright (C) Microsoft Corporation. All rights reserved. C:\Temp>Console1.exe forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types C:\Temp>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FF,
Do you think the error message is due to deallocate(this%c), and thus making the class(*) C(:) undefined. IOW when used as rhs it now has no class type, and thus differs from lhs.
In your test setup, what happens when you comment out the deallocate(this%c)? IOW letting move_alloc perform the deallocation (possibly after the compiler type checking code completes).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please let me make this clear:
The original code runs absolutely fine and is bug-free as long as it runs single-threaded.
As pointed out in #6, the problem is that prior to the call of for_finalize the global (!) state of the module is changed, which yields code that is inherently not thread-safe.
This is unexpected, since the /threads (multithreaded-runtime) and /Qauto (implicitly by /Qopenmp) are specified, and none of the used language features in the append-subroutine is documented to be not thread-save. (All variables are visible to just one thread.)
IMHO the compiler produces invalid code, otherwise the option /threads would be meaningless (at least when unlimited polymorphism is used).
Tobias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tobias Loew wrote:Please let me make this clear:
The original code runs absolutely fine and is bug-free as long as it runs single-threaded. ..
Can you not follow the go-by in Quote #15 to better illustrate the problem so other readers who may be similarly affected can consider the workarounds/fixes suggested by Black Belts such as Jim?
Aspects such as "the compiler produces invalid code" can help Intel Fortran team with whenever they implement a resolution but they hardly suggest to other readers what to do if they face issues with polymorphic objects and their assignments etc. in an OpenMP context.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove (Blackbelt) wrote:FF,
Do you think the error message is due to deallocate(this%c), and thus making the class(*) C(:) undefined. IOW when used as rhs it now has no class type, and thus differs from lhs. ..
Possibly. To me, the code in subprogram 'append' makes little sense given the use of unlimited polymorphic objects. So I'm not going to look too much further. It'll be useful if OP can follow up better.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello FortranFan,
I've uploaded the project in #6, so anyone interested can download, compile and run it. Furthermore, you have shown the errors in #15. The code is also not faulty: if you run it single-threaded it works, also if you comment out line 49 or line 50 from the code in #1 it also works multi-threaded (OpenMP or not doesn't matter).
I've analyzed the assembly code and convinced Steve Lionel (a black-belt, in case that matters) that this is definitely not an OpenMP related problem but a problem in the compiler: dynamically switching the information of the runtime-type at fixed addresses in static global memory is simply a bad decision.
Tobias
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Once you showed the disassembly code in #6 it was clear that this was a compiler error. Everything since then by other posters were constructive posts as to how to work around the problem. These instructive postings were not only directed to you, but to any of the other readers of this forum that may have come across this issue. Waiting on software support to fix the problem in the compiler is not a choice that many can afford to take.
Also, neither /auto nor /Qopenmp makes your code definitively thread-safe. Your append code, as an example, even with a compiler fix to not use static storage as the bug does now, would not be thread-safe with concurrent calls on the same list object. Nearly all of the execution statements would require being in a critical section.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page