- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
we are frustrating with a weird problem caused by the latest Intel's compiler suites (ifort/icc v14.0.2.144, Linux x86_64/EM64T). Depending on the optimization level, the following code generates different results:
CSTART
subroutine MakeWwdHlp2 (Ww,W1,dima,dimbe,dimga)
implicit none
integer dima,dimb,dimab,dimbe,dimga,key
real*8 Ww(1:dima,1:dimbe,1:dimga)
real*8 W1(1:dima,1:dimbe,1:dima,1:dimga)
integer a,be,ga
do ga=1,dimga
do be=1,dimbe
do a=1,dima
Ww(a,be,ga)=W1(a,be,a,ga)
end do
end do
end do
C Uncomment to get correct results/loop's counters
c print *,ga,be,a
return
end
CEND
The '-O0' as well as '-O1' optimization levels give us correct results while '-O2' doesn't. If the printing line is uncommented then results become always correct, regardless the optimization level used. The 'dimga','dima', and 'dimbe' counters are in the range [10,12]. Enclosed please find the assembler listing generated via '-00','-O1', and '-O2'.
It would be great to identify the compler's option that causes the problem.
Thank in advance!
Victor.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, there's nothing we can do for you here without sources we can use to build and run the program. The .s files are of no use in diagnostics.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Steve,
our project is a quite big one. Well, we will try to find a workaround either by playing around with compiler options or by reworking this subroutine. So, could you please provide us a hint how to list compiler options enabled and activated by default for a certain optimization level, i.e something similar to:
gfortran -v -Q -O2 -c ...
options enabled: -falign-labels -fasynchronous-unwind-tables -fauto-inc-dec -fbranch-count-reg -fcaller-saves ....
Thank you in advance!
Victor.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can try the ifort -list option which produces a listing file (.lst) containing a section COMPILER OPTIONS BEING USED. I do not know how fruitful that will be.
Does the entire program require being compiling at -O1 or -O0 to produce the correct results or just the single subroutine shown?
Could you perhaps isolate/create a reproducer by dumping the arrays to a file prior to calling the suspect routine and then create a driver to read in the data and only call the suspect routine only to see whether that reproduces the incorrect results at -O2 and correct results at -O1/-O0?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kevin,
thank you for your reply.
The whole project is compiled by using the '-O2' optimization level. However, in order to get correct results we have to recompile only this single subroutine with '-O0' and relink binaries. In this way we always get correct results. Finally, I have identified the compiler options that cause an optimization problem:
"-vec -simd"
If I keep the '-O2' level and disable vectorization of this subroutine via '-no-vec -no-simd' compiler options then everything works properly.
> Could you perhaps isolate/create a reproducer by dumping the arrays
No problem, I will do it. It just takes some time.
With best regards,
Victor.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, thank you for the additional clues Victor. I'm wondering if perhaps a driver that simply initializes the arrays with dummy values will show the bad results too. I can try that now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm having no success producing incorrect results with a simple driver for the earlier provided subroutine when varying optimizations so I hope you will be able to isolate something that can help us reproduce this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kevin,
Kevin Davis (Intel) wrote:
I'm having no success producing incorrect results with a simple driver for the earlier provided subroutine when varying optimizations so I hope you will be able to isolate something that can help us reproduce this.
interestingly, interchanging the loops along with providing a hint to compiler solves the problem:
subroutine MakeWwdHlp2 (Ww,W1,dima,dimbe,dimga)
implicit none
integer dima,dimb,dimab,dimbe,dimga,key
real*8 Ww(1:dima,1:dimbe,1:dimga)
real*8 W1(1:dima,1:dimbe,1:dima,1:dimga)
integer a,be,ga
do a=1,dima
do ga=1,dimga
cDEC$ VECTOR UNALIGNED
do be=1,dimbe
Ww(a,be,ga)=W1(a,be,a,ga)
end do
end do
end do
C Uncomment to get correct results/loop's counters
C print *,ga,be,a
return
end
Could it be problem that working arrays are not aligned on 16-byte boundary?
With best regards,
Victor.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Fortunately the smallness of the routine enabled our developer to spot a defect with loop collapse. I opened a defect report (internal tracking id noted below) and will keep you updated on the status as I learn it. From all those you found, you could choose which work around best fits your app.
They further wrote about the other items you noted:
“-no-vec –no-simd” helps because that shuts off transformations that enable more/better vectorization.
“Reordering loops” alone doesn’t help since the compiler reorders loops for better memory locality. Adding the directive disables such reordering, which in turn affects loop collapsing decision (and collapsing decides not to kick-in).
Thanks for reporting this issue.
(Internal tracking id: DPD200253575)
(Resolution Update on 09/11/2014): This defect is fixed in the Intel® Composer XE 2013 SP1 Update 4 release (2013.1.4.211 - Linux) -AND- the Intel® Parallel Studio XE 2015 Initial Release (2015.0.090 - Linux).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kevin,
thank you very much for your assistance and expertise!
I will keep an eye on it.
With best regards,
Victor.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Development indicates the fix for the earlier identified loop collapse defect is available in the latest Intel® Composer XE 2013 SP1 Update 4 release (2013.1.4.211 - Linux). It is also available in the newest Intel Parallel Studio XE 2015 release for Linux (Version 15.0.0.090 Build 20140723) should you be interested in upgrading to that new release.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page