- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As mentioned in the Release Notes with compiler 17 (Intel Parallel Studio XE 2017),
A significant change in this release is that the default for intrinsic assignment to allocatable arrays is now to do the automatic (re)allocation if necessary, as specified by the standard. In past releases you needed to specify the assume realloc_lhs option to get this behavior. In some applications the additional checking may affect performance - you can revert to the previous behavior by specifying norealloc_lhs or the new option nostandard-realloc-lhs.
I had assumed the above will have no impact on code compiled with the /standard-semantics compiler option. However, I'm encountering a bunch of strange issues involving intrinsic assignment of allocatable arrays even when the assignment does not involve any reallocation i.e., the LHS and RHS arrays indeed have the same shape, only the values are different. This is happening in code that has worked reliably with no change since compiler 14. Unfortunately I have no liberty whatsoever to share the actual code. So I have spent considerable time since last week to put together a generic reproducer but without any success. Hence I was wondering if anyone else has been experiencing any issues.
The problems I face include the following, note the compiler options for the codes are /standard-semantics /libs:static /threads /warn:all /stand /On where n may be d or 1,2, or 3 depending on the tests.
1) Run-time failures with either "insufficient virtual memory" or "access violation" errors.
2) Same exact code with same exact compiler and optimization options will complete execution with results as expected with Win32 (IA32) platform option, but experience runtime errors with x64 (Intel 64) platform option.
3) Some of the codes execute successfully with /Od and /O1 optimization options, but face above-mentioned run-time exceptions with /O2 or higher options.
4) However some of the codes referred to in 3) above can break with an unnamed exception in the debugger; the debugger then breaks into free.c procedure in Microsoft's C run-time library with an "error reading register" on line 50 (retval = HeapFree(_crtheap, 0, pBlock); instruction); the call stack shows Fortran code with instruction on a line where an intrinsic assignment to an allocatable array is taking place, even though no reallocation is necessary in any of the cases. Note for_realloc_lhs is present on the call stack.
5) But now if one steps through the code in the debugger line by line, the above break is sometimes avoided.
6) If compiler version is reverted back to compiler 16 with NO change to the code, all the programs referred to in the above failures execute as expected.
Hence it appears to be some sort of regression with /standard-semantics option. As I mentioned above, I'm struggling to put together a reproducer and it's taking up too much time. Any feedback is appreciated. It'll be great if Intel can report what kind of regression testing was performed and perhaps post a case or two that was used in the testing. I can then try the same and see if I notice expected behavior.
Thanks,
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please provide test cases. In the process of enabling this as a default we found and fixed many bugs that had not been exercised before, but it's possible we broke something. We ran our entire, VERY large test suite with this change and were in beta test with it for months.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
Please provide test cases. In the process of enabling this as a default we found and fixed many bugs that had not been exercised before, but it's possible we broke something. We ran our entire, VERY large test suite with this change and were in beta test with it for months.
Well, I'm raking my brains to create a reproducer.
The actual codes that experience these run-time failures do so with allocatable array components of REAL type in derived types which get defined (and redefined) with instructions involving intrinsic assignment in type-bound procedures, meaning typical OO code with methods working on passed objects and their data. One of my issues I have tons of very, very simple, generic OO type of code and I don't notice problems with these trivial cases. Then we have these extremely heavy-duty code libraries with deep OO hierachies which fail now but for which my hands are totally tied.
I just realized on one of the other build environments that the problems seem to have been introduced with update 1 of compiler 17 beta, meaning the codes all work with the initial compiler 17 beta as well as compiler 16, update 3. I wonder if this helps in anyway in figuring out my problems might be, say in connection with the changes that went into update 1 of compiler 17 beta.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It doesn't help - sorry. Only a reproducing test case helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
It doesn't help - sorry. Only a reproducing test case helps.
How about this?
module k_m use, intrinsic :: iso_fortran_env, only : WP => real64 implicit none private public :: WP end module k_m module b_m implicit none private type, abstract, public :: b_t private end type b_t end module b_m module d_m use k_m, only : WP use b_m, only : b_t implicit none private type, extends(b_t), public :: d_t private real(WP), allocatable :: m_d(:) real(WP), allocatable :: m_norm_d(:) contains private procedure, pass(this), public :: norm_d => get_norm_d procedure, pass(this), public :: set => set_d end type d_t contains pure subroutine set_d( this, d ) class(d_t), intent(inout) :: this real(WP), intent(in) :: d(:) this%m_d = d this%m_norm_d = this%m_d /sum( this%m_d ) return end subroutine set_d pure function get_norm_d(this) result( norm_d ) !.. Argument list class(d_t), intent(in) :: this !.. Function result real(WP), allocatable :: norm_d(:) norm_d = this%m_norm_d return end function get_norm_d end module d_m module a_m use k_m, only : WP use b_m, only : b_t use d_m, only : d_t implicit none private type, extends(b_t), abstract, public :: a_t private type(d_t), allocatable :: m_d contains private procedure, pass(this), public :: norm_d => get_norm_d procedure, pass(this), public :: set => set_a_t end type a_t contains pure subroutine set_a_t( this, d ) class(a_t), intent(inout) :: this real(WP), intent(in) :: d(:) if ( .not. allocated(this%m_d) ) then allocate( this%m_d ) end if call this%m_d%set( d ) return end subroutine set_a_t pure function get_norm_d(this) result( norm_d ) !.. Argument list class(a_t), intent(in) :: this !.. Function result real(WP), allocatable :: norm_d(:) norm_d = this%m_d%norm_d() return end function get_norm_d end module a_m module c_m use a_m, only : a_t implicit none private type, extends(a_t), public :: c_t private end type c_t end module c_m module f_m use k_m, only : WP use a_m, only : a_t use b_m, only : b_t use c_m, only : c_t type, extends(b_t), public :: f_t private class(a_t), allocatable :: m_a(:) contains private procedure, pass(this) :: load_f_t procedure, pass(this), public :: norm_d => get_norm_d generic, public :: load => load_f_t end type f_t contains pure subroutine load_f_t( this, d ) class(f_t), intent(inout) :: this real(WP), intent(in) :: d(:) if ( .not. allocated(this%m_a) ) then allocate( c_t :: this%m_a(1) ) end if call this%m_a(1)%set( d ) return end subroutine load_f_t pure function get_norm_d(this) result( norm_d ) !.. Argument list class(f_t), intent(in) :: this !.. Function result real(WP), allocatable :: norm_d(:) norm_d = this%m_a(1)%norm_d() return end function get_norm_d end module f_m module xf_m use k_m, only : WP use b_m, only : b_t use f_m, only : f_t type, extends(b_t), public :: xf_t private logical :: m_calc = .false. contains private procedure, pass(this) :: calc_xf_t generic, public :: calc => calc_xf_t end type xf_t contains pure subroutine calc_xf_t( this, f, d ) class(xf_t), intent(inout) :: this type(f_t), intent(inout) :: f real(WP), intent(in) :: d(:) call f%load( d ) this%m_calc = .true. return end subroutine calc_xf_t end module xf_m module ef_m use k_m, only : WP use b_m, only : b_t use f_m, only : f_t use xf_m, only : xf_t type, extends(b_t), public :: ef_t private logical :: m_calc = .false. type(xf_t), allocatable :: m_xf type(f_t), allocatable :: m_f contains private procedure, pass(this) :: calc_ef_t generic, public :: calc => calc_ef_t end type ef_t contains pure subroutine calc_ef_t( this, d ) class(ef_t), intent(inout) :: this real(WP), intent(in) :: d(:) if ( .not. allocated(this%m_f) ) then allocate( this%m_f ) end if call this%m_f%load( d ) if ( .not. allocated(this%m_xf) ) then allocate( this%m_xf ) end if call this%m_xf%calc( this%m_f, d=this%m_f%norm_d() ) this%m_calc = .true. return end subroutine calc_ef_t end module ef_m module ec_m use k_m, only : WP use b_m, only : b_t use ef_m, only : ef_t type, extends(b_t), public :: ec_t private logical :: m_calc = .false. type(ef_t), allocatable :: m_ef contains private procedure, pass(this) :: calc_ec_t generic, public :: calc => calc_ec_t end type ec_t contains pure subroutine calc_ec_t( this, d ) class(ec_t), intent(inout) :: this real(WP), intent(in) :: d(:) if ( .not. allocated(this%m_ef) ) then allocate( this%m_ef ) end if call this%m_ef%calc( d ) this%m_calc = .true. return end subroutine calc_ec_t end module ec_m
program p use k_m, only : WP use ec_m, only : ec_t implicit none type(ec_t) :: foo real(WP) :: d(2) integer :: I do I = 1, 2 call random_number( d ) call foo%calc( d ) end do stop end program p
C:\..>ifort /c /standard-semantics /O2 /warn:all /stand /libs:static /threads m.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R ) 64, Version 17.0.0.109 Build 20160721 Copyright (C) 1985-2016 Intel Corporation. All rights reserved. C:\..>ifort /c /standard-semantics /O2 /warn:all /stand /libs:static /threads p.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R ) 64, Version 17.0.0.109 Build 20160721 Copyright (C) 1985-2016 Intel Corporation. All rights reserved. C:\..>link /out:p.exe /subsystem:console p.obj m.obj Microsoft (R) Incremental Linker Version 12.00.40629.0 Copyright (C) Microsoft Corporation. All rights reserved. C:\..>p.exe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks - I can reproduce the error, but it's complicated. I will comment that the actual implementation of realloc_lhs changed significantly in this release to be better for optimization (much smaller code size), but so far I don't see that itself being the culprit. Looks like a stack corruption issue somewhere. Still working on it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
Thanks - I can reproduce the error, but it's complicated. I will comment that the actual implementation of realloc_lhs changed significantly in this release to be better for optimization (much smaller code size), but so far I don't see that itself being the culprit. Looks like a stack corruption issue somewhere. Still working on it.
Thanks, Steve.
Re: your comments, "it's complicated" and "Looks like a stack corruption issue somewhere", are you thinking the problem(s) lie(s) with the code as shown above? And that the code will need to be fixed to work with compiler 17? Fyi, I tried the above reproducer with gfortran and I also tested one of the actual code libraries that runs into issues with compiler 17; just as with Intel Fortran compiler 16, gfortran didn't raise any issues. So I'll be curious to know if there is something wrong with the code.
Thanks much,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have no evidence your source code is at fault, but this sort of error is exceedingly difficult to chase down, because any minor change causes the problem to move or hide. When I have more info I will let you know, and also if I can come up with a workaround. I did determine that it isn't optimization-related.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I identified a compiler bug that is causing the failure. Right now the only workaround I can offer is to write the code assuming automatic reallocation doesn't occur (and add /assume:norealloc_lhs or /nostd-realloc-lhs). In particular the assignment at line 51 is not being handled correctly.
I have escalated this as issue DPD200414403 and asked that it be given priority as I think this problem will affect many users.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
I identified a compiler bug that is causing the failure. Right now the only workaround I can offer is to write the code assuming automatic reallocation doesn't occur (and add /assume:norealloc_lhs or /nostd-realloc-lhs). In particular the assignment at line 51 is not being handled correctly.
I have escalated this as issue DPD200414403 and asked that it be given priority as I think this problem will affect many users.
Steve,
Thanks very, very much for your most prompt attention to this issue and your follow-up with Intel Fortran development team; appreciate it greatly.
Please note I didn't want to prejudice your investigation in any way, especially because you had commented previously on the possible complications and issues such as with stack corruption. But it appeared to me there is perhaps a problem in compiler 17 with plausibly an array temporary associated with the function result used as actual argument in line 252 and what might happen consequently on the stack during the instruction (I didn't do any assembler level checks though, so I might be wrong about this). Given this line of thinking, the workaround I came up with is shown below. It involves making use of a 'work area', a private component of the derived type, and employing it as the actual argument. As you would have realized, the reproducer in Message #5 is a severely emaciated body extracted from one of the actual codes; in this stripped down silly-looking example, the component m_d of derived type d_t represents 'raw data' whereas component norm_d represents some transformed/normalized data, the instruction on line 52 being just a nutty prototype of the actual transformations. Now, in the actual code libraries, the 'classes' i.e., the derived types represented by ef_t in the example above already include private 'scratch space' type of components with the same shape as the input 'data'. Hence the methods of subobjects of instances of ef_t (i.e., those equivalent to m_xf%calc on line 252 in the above example), which need to 'operate' on the data, NOW have these scratch vectors as actual arguments in our 'worked around' code as opposed to function invocations. The tests all seem to run well with such a workaround, so we'll forge ahead with it!
module ef_m use k_m, only : WP use b_m, only : b_t use f_m, only : f_t use xf_m, only : xf_t type, extends(b_t), public :: ef_t private logical :: m_calc = .false. type(xf_t), allocatable :: m_xf type(f_t), allocatable :: m_f real(WP), allocatable :: m_norm_d(:) !** add a 'work area' component here; it's already present in actual code contains private procedure, pass(this) :: calc_ef_t generic, public :: calc => calc_ef_t end type ef_t contains pure subroutine calc_ef_t( this, d ) class(ef_t), intent(inout) :: this real(WP), intent(in) :: d(:) if ( .not. allocated(this%m_f) ) then allocate( this%m_f ) end if call this%m_f%load( d ) this%m_norm_d = this%m_f%norm_d() !** load 'transformed' data into 'work area' if ( .not. allocated(this%m_xf) ) then allocate( this%m_xf ) end if call this%m_xf%calc( this%m_f, d=this%m_norm_d ) !** use 'work area' as actual argument to subobject this%m_calc = .true. return end subroutine calc_ef_t end module ef_m
A question for you is whether the above workaround provides you and/or the development team with any further hints on the underlying issue(s) with compiler 17 and the steps toward a proper resolution.
By the way, please note the actual codes are totally reliant on the standard feature of automatic reallocation, hence options such as /assume:norealloc_lhs are a no-go for us.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FortranFan,
Great job on the work around.
Why not place the work area array m_norm_d inside the calc_ef_t subroutine? I suppose it may reduce the number of allocations to have additional copies of the array (assuming they are not overly large and not in abundance).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
FortranFan,
Great job on the work around.
Why not place the work area array m_norm_d inside the calc_ef_t subroutine? I suppose it may reduce the number of allocations to have additional copies of the array (assuming they are not overly large and not in abundance).
Jim Dempsey
Jim,
Good point, a local array should also work and in some situations, it may be the better option. In the case of actual code libraries, as I mention above, the work areas of appropriate shape already existed and therefore the workaround was simply to make use of them. By the way, the arrays in most of the use cases of these codes are not large, though they do vary quite a bit from simulation to simulation anywhere from 2**8 to 2**16 elements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have evidence that there is a problem with the function value as argument? That would be entirely unrelated to the reallocation issue.
The problem I found is that on a second call to the routine with the assignment, the information passed to the "does this need reallocation?" check contains uninitialized fields, leading to unpredictable behavior. Sometimes it seems to work, sometimes it corrupts heap storage. (I didn't see evidence that it actually corrupts the stack.)
I do want to put in a plug for Intel Inspector XE's "memory inspection" which was a big help in locating the problem for me, but then I had to go digging into assembly to figure out exactly what went wrong.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
Do you have evidence that there is a problem with the function value as argument? That would be entirely unrelated to the reallocation issue.
The problem I found is that on a second call to the routine with the assignment, the information passed to the "does this need reallocation?" check contains uninitialized fields, leading to unpredictable behavior. Sometimes it seems to work, sometimes it corrupts heap storage. (I didn't see evidence that it actually corrupts the stack.)
I do want to put in a plug for Intel Inspector XE's "memory inspection" which was a big help in locating the problem for me, but then I had to go digging into assembly to figure out exactly what went wrong.
"Do you have evidence that there is a problem with the function value as argument?" Well, as you can see, it's anecdotal. Outside of the reproducer code in Message #5, I've not been able to create an issue with automatic reallocation; but this very same reproducer seems to work if the function in the actual argument in line 252 is replaced with an array, as shown with the workaround in Message #10.
Re: stack corruption, I was only going by what you said in Message #6.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> a local array should also work and in some situations
If you know for a fact that threaded and/or recursive use will not be used, the work area can be attributed as SAVE and you could explicitly use array slices for size require, enlarging the array only when necessary. This would reduce the number of allocate/deallocate (assuming that is a performance issue). I noticed in V17 that some of the OpenMP TASK is using the TBB scalable allocator. I haven't looked closer to see if the ALLOCATE and DEALLOCAT statements are now using the TBB scalable allocator. If so, then, depending on the sizes of the allocations the time "saved" on reducing the allocate/deallocate might not be so great.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was just speculating regarding stack corruption until I found the real culprit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The bug reported here will be fixed in Update 2.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page