- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I will report a problem with unrolling of a very simple loop:
DO 200 I = 1, NDIM
VEC(I) = A(I*(I+1)/2)
200 CONTINUE
Something goes wrong when the 'A' and 'VEC' arrays/pointers refer to the same memory location and the memory address itself is not aligned on 16-byte boundary. When this piece of code is compiled with '-O2' we always get wrong results for the the first [2:k] elements of VEC, where k depends on certain value of 'NDIM'. However, we got always correct results if unrolling is completely disabled either by passing the '-unroll=0' option to compiler or by using the DEC attribute:
cDEC$ NOUNROLL
DO 200 I = 1, NDIM
VEC(I) = A(I*(I+1)/2)
200 CONTINUE
Just to show the problem, below is given output from our real test:
#1
COPDIA NDIM: 67
ADRESSES :38897040 38897040
ADDRESSES ARE NOT ALIGNED ON 8-BYTE BOUNDARY
2.258586282289821E-004 2.258586282289821E-004 T
1.129034497633417E-003 1.790368778501193E-003 F
1.790368778501193E-003 2.945284711568669E-003 F
1.873467396476006E-003 2.331244172747492E-002 F
2.618478092336402E-003 4.152520189742853E-002 F
2.945284711568669E-003 6.745867362432861E-002 F
1.335999185554419E-002 0.114438101565396 F
1.345512252979789E-002 0.294939942339324 F
1.676828907801860E-002 0.644754785440931 F
2.331244172747492E-002 1.15038004433901 F
3.037574582317423E-002 1.84400534002079 F
3.208721233146836E-002 3.208721233146836E-002 T
....
#2
COPDIA NDIM: 27
ADRESSES :37716080 37716080
ADDRESSES ARE NOT ALIGNED ON 8-BYTE BOUNDARY
3.769405624512559E-002 3.769405624512559E-002 T
8.894699416983025E-002 9.866743377462182E-002 F
9.866743377462182E-002 0.292273738386099 F
0.231468604273682 0.384206422411758 F
0.247144738198662 0.566242246806149 F
0.292273738386099 1.42326482050238 F
0.303953189347391 0.303953189347391 T
....
here, we are comparing the 'VEC' arrays resulted from the optimized (unrolled) and modified (non-unrolled) versions of the loop; ADRESSES correspond to memory locations of 'A', and 'VEC' arrays.
Enclosed please find a code snippet used for our testing purposes as well as output resulted from a real test. Since our is project is a quite big one, we cannot provide the whole source code.
With best regards,
Victor.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As your source code violates the Fortran standard, you must set -assume dummy_aliases if you hope to have this work in any reasonable way. As your attached source code doesn't compile, and you didn't mention this point, I can't verify whether that makes the difference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
thank you very much for your expertise and hint suggested! Indeed, passing the '-assume dummy_aliases' option to compiler solves the problem. Moreover, allocating and using a temporary array also works out. I just will mention that Intel v13 as well as other Fortran's compilers produce correct results, regardless of arguments aliasing.
With best regards,
Victor.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page