- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
w_fcompxe_2011.10.35
Windows 7 x64
VS 2010
(comments below jpg)
Note, Debugger shows second subscript of T with lower bound of 0.
This bound is correct (all bounds are correct)
The generated bounds checking code is in error (see black screen at bottom)
I believe this, or a similar problem, The above should be a simple reproducer.
[fortran]module MOD_AVX ! SSE two-up double vector type TypeXMM SEQUENCE real(8) :: v(0:1) end type TypeXMM ! SSE two-up double vector triplet type TypeXMMxyz SEQUENCE real(8) :: vX(0:1) real(8) :: vY(0:1) real(8) :: vZ(0:1) end type TypeXMMxyz ! AVX four-up double vector type TypeYMM SEQUENCE real(8) :: v(0:3) end type TypeYMM ! AVX four-up double vector triplet type TypeYMMxyz SEQUENCE real(8) :: vX(0:3) real(8) :: vY(0:3) real(8) :: vZ(0:3) end type TypeYMMxyz end module MOD_AVX ... subroutine CopyToYMM_2D(f, t, s) use MOD_AVX real, pointer :: f(:,:) type(TypeYMM), target :: t(:,:) integer :: s integer :: i,j real, pointer :: slice(:,:) do j=LBOUND(f, DIM=2),UBOUND(f, DIM=2) do i=LBOUND(f, DIM=1),UBOUND(f, DIM=1) t(i,j).v(s) = f(i,j) end do end do slice(LBOUND(f, DIM=1):UBOUND(f, DIM=1), LBOUND(f, DIM=2):UBOUND(f, DIM=2)) => t(LBOUND(f, DIM=1), LBOUND(f, DIM=2))%v(s::4) deallocate(f) f => slice end subroutine CopyToYMM_2D [/fortran]
call with arrays (via pointer) allocated to
real(8), pointer :: F(:,:)
type(TypeYMM), pointer :: T(:,:)
...
allocate(F(1:3,0:10+1))
allocate(T(1:3,0:10+1))
call CopyToYMM_2D(F, T, 0)
You will have to add an interface for CopyToYMM_2D
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In Debug build, with bounds checking disabledfor the above file, the deallocate(f) would (after some deallocations) report corrupted heap.
As to if this problem is related to the bounds check error or not I cannot say.
Note, by NOT deallocating the memory, the application runs correctly.
An explination of what is going on with the code may be in order.
The application is somewhat like a finite element simulation program. The simulation is of tethers and objects. Tethers can be viewed as a 2D object in 3D space and represented as a collection of segments connecting beads. Each segment has many properties and states as do the beads.
There are ~40 arrays, some with rank-2 some rank-1 with a mix ofXYZ vectors and scalars.
The current code is multi-threaded (OpenMP) and distributes the workload on tether by tether basis. Load distribution is relatively good.
At issue is, many of the loops will vectorize to some extent. However, in the case of the XYZ vectors (or properly said, vector of XYZ vectors) only part of the calculation is vectorizable, the remainder is scalar and cross lane w/rt SSE/AVX. My system has AVX.
To improve (maximize) vectorization on AVX, I am reallocating the ~40 arrays, mapped in a manner such that tethers are now 4-up (filling an AVX small vector)...
!!! yet, because Fortran has pointers with LowerBound:UpperBound:Stride I can remap the former allocations to the newer allocation format.
*** with all the old code remaining untouched ***
The solution has over 700 files (~700,000 lines of code).
Now then. the critical compute loops can have each thread working on 4 tethers at a time with a significantly higher degree of vectorization.
My conversion is not complete so I do not have performance information. I hope to have converswion done in a few weeks and then do a write-up.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Further LBOUND(t, DIM=2) returns the correct (external) lower bound of 0.
And the debugger is able to obtain the correct bounds
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are seeing an lbound of zero inside the procedure (after you remove code that's potentially trampling outside the array bounds - that could be obliterating part of the array descriptor for the dummy argument and confusing the issue) then that's a compiler bug.
If the debugger is not complaining about a lower bound of zero, then that's a debugger bug (potentially exacerbated by t appearing in multiple scopes). The debugger has lead me up the garden path a few times previously in similar situations - to the extent that it is no longer on my Christmas card list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Deferred shape requires the pointer or allocatable attribute
Thank you, this was my error.
>>If the debugger is not complaining about a lower bound of zero, then that's a debugger bug (potentially exacerbated by t appearing in multiple scopes).
Inserting:
write(*,*) LBOUND(f, DIM=2), LBOUND(t, DIM=2)
shows:
0 1
Clearly the compiler is generating and runtime is seeing 0-base converted to 1-based.
The pointer f(:,:) is receiving the pointer to the 0-based (DIM=2)array descriptor
The dummy (target) t(:,:) is receiving a new, 1-based (DIM=2)array descriptor.
*** The debugger is showing the 0-based array descriptor for t(:,:) ???
By changing:
type(TypeYMM), target :: t(:,:)
to
type(TypeYMM), pointer :: t(:,:)
Now I get the deferred shape (0-based DIM=2) value an now t is consistent with the debugger
*** and more importandly, I am not addressing outside of bounds
(writing to t(1,0) before, apparently overwrote "somehing" and occasionally that something was an array descriptor that some time later showed up as corrupted heap.
Now IanH, I have a question.
You will note at the end of the subroutine I have
This is where the magic comes in of remapping the old array pointer with stride(s)1, to the newly remapped format requiring stride of 4 on DIM=2. also note that the origin pointer indexes by slice number v(s::4).
Now then to the question I have.
As long as the old code uses the new array descriptor (as pointed to by the converted pointer) the code works fine. Also, in the few places where f(:,n) is used I see in Debug build "Array temporary created". This is good, as it makes the code work and indicates locations where I have yet to convert the code. The question now becomes:
In a new subroutine, if I use something equivalent to
real(8) :: foo(:,:)
And the caller passes in something like the f(:,:)
IOW the base is converted from 0-base to 1-base, what happens to the stride?
Will this require an array temporary and conversion to stride-1?
I suppose I could mock-up a test. I am not concerned about new code I write, it is the old code that concerns me.
BTW, thanks for standing up to me to show me the errors of my way. I am wrong from time to time.
Prior to this, I have hardly used the stride feature of Fortran. This feature is quite neet, and in particular for me for this conversion, it eliminates having to alter 10's-100's of source files due to the re-arrangement of the data. I will have to add a few 10's of same functionality routines that use the newAVX (or SSE) packed data.
I think you might call this polymorphic data placement (or some better phraseology).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page