- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We recently upgraded our 64 bit Fortran compilers on Windows, OSX and linux to 15.0.1.148 Build 20141023. One of our nightly verification cases failed when using the Windows release compilation (/O2 /Qopenmp). The failure was substantial -- a totally spurious result. All other builds worked (linux, OSX, Windows debug). This is a very large computational fluids code broken up into about 30 source files. The code is fully compliant with Fortran 2003 and we run the thread checker on hundreds of test cases each day. The thread checker did not detect a problem, and we have run these test cases with previous versions from roughly 9 through 13. I tracked down the problem to one source file, which contained a single OpenMP construct. When I removed the OpenMP comments, but still compiled with /Qopenmp, the case still failed. When I compiled this one source file without the /Qopenmp, the case worked. I even got the case to work with OpenMP when I added a write statement to one of the loops -- a loop that was not parallelized with OpenMP. This leads me to suspect that there is a problem with the /O2 optimization (/O1 works fine) combined with /Qopenmp. It would be difficult to submit the entire code and test case, but can you think of something that might have changed in version 15 vis a vis OpenMP and optimization?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your question is too open-ended to have a useful answer. Please construct a test case and submit it to us, either here in the forum or through Intel Premier Support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might look for an inadvertent dependence on SAVE effect, particularly as to local arrays. Problems in this area aren't guaranteed to surface but could easily do so with a minor change in compiler optimizations.
The problem with SAVE (or not) syntax could be exposed by a function called in a parallel region, regardless of whether it has any OpenMP directives. Needless to way, if every thread is expected to share a result calculated at some prior execution, but it's not properly declared, it's a problem waiting to surface.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not have any explicit SAVE statements or attributes in the source file that appears to be the problem. In general, I don't expect locally defined variables to be saved when leaving the subroutine and I don't use SAVE or the compiler equivalent.
I am using the options /O2 /traceback /Qopenmp
Do you know, in addition to the save functionality, how /Qopenmp might affect the default compiler options that I am not explicitly setting? Also, do you know off-hand how I might get verbose information on the optimization of the loop that changes behavior when I add the WRITE statement. Maybe the structure of the loop is unnecessarily complicated. In the past, I have fixed problems like these by just writing cleaner code. I cannot construct a simple test case without bundling the entire code and makefile into some portable form.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> When I removed the OpenMP comments, but still compiled with /Qopenmp, the case still failed. When I compiled this one source file without the /Qopenmp, the case worked.
When you compile with /Qopenmp then local arrays are stack based.
When you compile without /Qopenmp, then by default local arrays are implicitly SAVE (recursive and options to assert auto override this)
Look at your code for
real :: SomeLocalArray(1234)
or different array types and ranks.
My guess is that some piece of the code is relying on that (one or more of) the locally defined arrays as being SAVE.
(usually it is the other way around... requiring them not to be save for thread safe-edness)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Within the routine that I believe is the source of the problem, I have a loop that looks like this
DO K=... DO J=... DO I=... IF ( ) THEN SELECT CASE( ) CASE(1) A = ... B = ...
It is somewhat inelegant code, but not something that really needs to be optimized. When I put a WRITE statement after B=, the problem goes away. That is, even with /O2 and /Qopenmp, the code works properly. I did an optimization report, and without the WRITE statement, I get a summary of how the loops and the SELECT statements are not being optimized. With the WRITE statement, the SELECT construct is no longer mentioned in the optimization report. I infer from this that the WRITE statement takes that out of consideration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ifort does appear to stumble over its efforts to optimize select case in loops, particularly in comparison with the relatively simple methods of gfortran. I haven't seen a better suggestion than yours about stopping this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I experienced optimization problems like that too. What I end up doing is something like:
DO K=... DO J=... DO I=... IF ( ) THEN SELECT CASE( ) CASE(1) A = ... IF(ISNAN(A)) PRINT *, "Foo.f90 dummy statement to fix compiler bug" B = ...
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it seems that any write statement will do it. In fact, I just rearranged my loop structure:
IF ( ) THEN SELECT CASE( ) CASE(1) DO K=... DO J=... DO I=... A=...
and the problem goes away. It's a bit more code, but I'd rather be safe than sorry. I cannot recreate this bug with a simpler case, and it would be alot of work to submit the entire code, makefile, and test cases to the support desk, but I can say that this was not a problem in Fortran compiler versions prior to 15. And it is definitely tied to /Qopenmp. When I compile this routine without it, things work no matter what. I am not even using OpenMP constructs in the troublesome subroutine, so it must be related to the change in default behavior when /Qopenmp is invoked.
Thanks for pointing out the problem with the loop structure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I ran into the same problem.
Changing the CPU level from SSE3 to SSE4.1 solved it for me. (/arch:SSE4.1)
I also tried AVX but then the problem reappeared.
The problem seems to be in the vectorisation optimalisation.
Try to set /Qopt-report:5 and look in the reports for vector optimization results.
I still had no time to make a test case, but I try to make one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could anyone provide a chunk of the top level code that shows this behavior? Just the loop nest and maybe declarations for a few of the important variables. That should be a lot easier than creating an executable reproducer. Along with the command line and the description above, that may be enough to work with.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We tried to make a test case but it just works, so it only happens when the loop is deep in the code.
The failing loop is a simple loop to copy data.
This is the output from optrpt
LOOP BEGIN at c:\local\V151\sources\plib_web\crweb.f(910,7)
remark #25084: Preprocess Loopnests: Moving Out Store [ c:\local\V151\sources\plib_web\crweb.f(913,7) ]
remark #15389: vectorization support: reference 1 has unaligned access [ c:\local\V151\sources\plib_web\crweb.f(911,8) ]
remark #15389: vectorization support: reference 3 has unaligned access [ c:\local\V151\sources\plib_web\crweb.f(911,8) ]
remark #15389: vectorization support: reference 1 has unaligned access [ c:\local\V151\sources\plib_web\crweb.f(912,8) ]
remark #15389: vectorization support: reference 2 has unaligned access [ c:\local\V151\sources\plib_web\crweb.f(912,8) ]
remark #15381: vectorization support: unaligned access used inside loop body
remark #15427: loop was completely unrolled
remark #15300: LOOP WAS VECTORIZED
remark #15450: unmasked unaligned unit stride loads: 2
remark #15451: unmasked unaligned unit stride stores: 2
remark #15475: --- begin vector loop cost summary ---
remark #15476: scalar loop cost: 9
remark #15477: vector loop cost: 4.000
remark #15478: estimated potential speedup: 1.440
remark #15479: lightweight vector operations: 9
remark #15480: medium-overhead vector operations: 1
remark #15488: --- end vector loop cost summary ---
LOOP END
It looks normal.
When the program runs it crashes on the 2nd turn and it looks that the data is invalid.
It crashes with an access violation.
When we recompile with /O1 or set the directive !DIR$ OPTIMIZE:1 in the sourcefile with the problem it works as expected.
So, something goes wrong with vectorization optimization when the loop is deeply into the code.
We are now using O1 in our production build until this problem is solved.
I cannot give all the source to intel to build a test case, the only option is to let someone remotely look on a production PC to test.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the illustrative code. I haven't yet been able to reproduce a problem with that structure.
Please could you try disabling inlining and interprocedural optimizations with /O2 /Qopenmp /Qip- /Ob0 and see if that makes a difference? (disable inline function expansion and interprocedural optimization in the IDE).
Is this function, Compute_Velocity_Error, being called from within an OpenMP parallel region? The only default that is changed explicitly by /Qopenmp is that /autoscalar is converted to /auto, as Jim has already explained. If there was a thread safety problem you'd expect the version without /Qopenmp to fail, not the other way round. Apart from this, it's surprising that /Qopenmp has much effect unless the routine is inlined into a routine that has OpenMP constructs.
Have you tried /check? It's conceivable that an uninitialized variable could cause different results with different options or compilers.
Your loop has too many "CASE"s to be vectorizable:
"remark #15534: loop was not vectorized: loop contains arithmetic if or computed goto. Consider using if-then-else statement. [ test_select.f90(116,28) ]"
Question to Michel: are you saying that a loop with SELECT/CASE is being vectorized? If so, how many CASEs are there?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I removed the ipo option, the problem persisted. When I used /O1 instead of /O2, the problem was resolved. When I removed the /Qopenmp, the problem was resolved. I concluded that both /O2 and /Qopenmp were needed to make the problem occur. The function is called from a non-OpenMP region of the code. There are actually no OpenMP constructs within the subroutine, and only one in that particular source file. Removing that one OpenMP construct did not solve the problem. Removing the /Qopenmp flag from that file's compilation did solve the problem. My assumption, based on the discussion in this thread, is that the /Qopenmp option has other subtle effects on compilation defaults. In debug mode, we turn on all options like /check to look for uninitialized variables. None were found -- as I mentioned, this test case that failed has been in our daily verification suite for years and has worked with past versions of the compiler.
I did notice that the loop structure was too complicated for vectorization. I was surprised that any kind of optimization was being attempted for this somewhat bad bit of programming. This subroutine is not really a CPU hog, so I have not tried to make it faster.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just a comment.
I found this kind of "error" a few times on our code.
Same behaviour: /O2 + /Qopenmp (sometimes inline also) caused failures, usually on parts of the code that didn't have any OpenMP related code nor were called from parallelized regions.
Usually a write in the looping, or even a small change in statements order is enough to "solve" the problem.
But in fact, twice of the times I found this behaviour in the code, the problem were unitialized variables on other parts of the code that happened to change chuncks of memory which happened to be allocated for other program variables/arrays.
A complete run time check can help find this kind of problem (but not always).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the extr
I see the same loop optimization messages, compiling with or without -qopenmp. Line numbers are different, of course, and I'm compiling a single, standalone subroutine with no inlining.
Non-optimizable loops:
LOOP BEGIN at test_select.f90(25,1) ( this is CHECK_WALL_LOOP)
remark #15534: loop was not vectorized: loop contains arithmetic if or computed goto. Consider using if-then-else statement. [ test_select.f90(62,15) ]
LOOP BEGIN at test_select.f90(145,4)
remark #15536: loop was not vectorized: inner loop throttling prevents vectorization of this outer loop. Refer to inner loop message for more details.
LOOP BEGIN at test_select.f90(144,7)
remark #15536: loop was not vectorized: inner loop throttling prevents vectorization of this outer loop. Refer to inner loop message for more details.
LOOP BEGIN at test_select.f90(143,10)
remark #15534: loop was not vectorized: loop contains arithmetic if or computed goto. Consider using if-then-else statement. [ test_select.f90(116,28) ]
LOOP END
LOOP END
LOOP END
LOOP END
Is that what you see? Adding a print statement inside a CASE does not change them. A print statement is a function call, so it could possibly have an impact on inlining, as well as on general loop optimization.. /Qipo enables inlining accross source files; you need the other options to disable inlining within a source file. There's no particular reason to think this is related to inlining, but it's hard to think what else might be responsible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Martyn,
This is the source file where things go wrong.
The loop at line 29 goes wrong.
Either adding a write statement, changing /O2 to /O1, adding !DIR$ OPTIMIZE:1 to this source file or changing /arch:SSE3 to /arch:SSE4.1 makes the problem disappear.
There might be a memory leak problem but all checking is enabled in the debug build environment.
subroutine prms2d3l(fnc,prms,bp,ep,num,tmat,vw,d3l,ap) implicit none include '../com/panel3d.i' integer*2 fnc(*),num,ap,i,j,acp real*4 prms(3,*),bp(2,*),ep(2,*),vw,d3l(3,*),cp(2,360) real*8 tmat(3,3),bp8(2),ep8(2),mp8(2),r8 cirseg=5.0 ap=0 do i=1,num ap=ap+1 d3l(1,ap)=bp(1,i)*tmat(1,1)+vw*tmat(2,1)+bp(2,i)*tmat(3,1) d3l(2,ap)=bp(1,i)*tmat(1,2)+vw*tmat(2,2)+bp(2,i)*tmat(3,2) d3l(3,ap)=bp(1,i)*tmat(1,3)+vw*tmat(2,3)+bp(2,i)*tmat(3,3) if (fnc(i).eq.1.and.abs(prms(3,i)).gt.1e-6) then bp8(1)=bp(1,i) bp8(2)=bp(2,i) ep8(1)=ep(1,i) ep8(2)=ep(2,i) mp8(1)=prms(1,i) mp8(2)=prms(2,i) r8=prms(3,i) call prms2cp4(bp8,ep8,mp8,r8,acp,cp) do j=1,acp ap=ap+1 d3l(1,ap)=cp(1,j)*tmat(1,1)+vw*tmat(2,1)+cp(2,j)*tmat(3,1) d3l(2,ap)=cp(1,j)*tmat(1,2)+vw*tmat(2,2)+cp(2,j)*tmat(3,2) d3l(3,ap)=cp(1,j)*tmat(1,3)+vw*tmat(2,3)+cp(2,j)*tmat(3,3) enddo endif enddo ap=ap+1 d3l(1,ap)=ep(1,num)*tmat(1,1)+vw*tmat(2,1)+ep(2,num)*tmat(3,1) d3l(2,ap)=ep(1,num)*tmat(1,2)+vw*tmat(2,2)+ep(2,num)*tmat(3,2) d3l(3,ap)=ep(1,num)*tmat(1,3)+vw*tmat(2,3)+ep(2,num)*tmat(3,3) return end
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michel,
Thanks for the source code. With a few additions, such as prms2cp4, that allowed me to construct a reproducer.
The problem is indeed related to optimization of the loop at line 27 above, and appears to be a regression in the version 15 compiler. It can be worked around by preventing vectorization of this loop in any of the ways that you suggested. (PRINT is effectively a function call, and so effectively disables vectorization).
It does not look to me to be related to SELECT/CASE or /Qopenmp as discussed in the original posting, so we might want to create a separate thread for your issue. I am escalating it to the compiler developers for further investigation. We'll let you know when there's a fix.
Thanks for the report and source code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Michel, as an end user, I wish to thank you for your efforts in providing a reproducer to aid Intel in correcting this problem.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been running trough the debugger all day and came to the point that I could compile with debugging and /O2 enabled and found that in the 2nd call to a function the debugger showed the contents of a small array was present in the call but corrupted as the parameter in the function where the program crashed. It looked more as a pointer than an array.
The array is correctly passed to the subroutine when compiling with /O1
I am really glad intel can reproduce the problem and hopefully they will fix it soon.
Now I should report some compiler internal errors when I compile using either /Qtrapuv or /fpe:0 or /fpe-all:0
But I have run out of time to make some test cases soon.
-------------
04010002_1856
rerelhole.f(521): catastrophic error: **Internal compiler error: internal abort** Please report this
error along with the circumstances in which it occurred in a Software Problem Report. Note: File and line given may not be explicit cause of this error.
compilation aborted for rerelhole.f (code 1)
------------------
berhoek_jig.f
/Qtrapuv /fpe:0
EXPR_ABS.MS128(t288_2059)
FATAL ERROR : Compiler Internal Error
ifort: error #10273: Fatal error in K:\Fortran\if_15.1.148\bin\ia32\fortcom, terminated by 0x2

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page