- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a particle-in-cell simulation code for which (both v. 17 and 18) ifort optimization exposes a bug, and I'm not sure where to turn for help. I have simplified the code as much as possible, so there are just a few lines of meaningful code. The bug depends on optimization level, so it's frustrating but not surprising that the bug's appearance depends sensitively on a number of seemingly irrelevant code statements, which presumably affect how the code is transformed during optimization.
My knowledge of fortran is admittedly not very thorough or systematic; is there something subtle (or obvious!) that I'm doing wrong that encourages the compiler to optimize in a way that contradicts my intent?
The simplified code performs only addition/subtraction starting with double precision values of 0 or 1, so I would think that would tend to rule out finite precision problems. More important, the code functions properly with low optimization and with runtime bounds checking -- so I think this should rule out really obvious bugs (like accessing invalid array values).
The bug occurs with "-O3 -ipo" or "-O3 -ipo-separate" but not with "-O3" or "-O2 -ipo". Using "-fltconsistency" fixes the bug. The following compilation options seem to have no effect (on whether the bug occurs): -fno-inline -no-vec -no-simd -no-scalar-rep -qno-opt-assume-safe-padding -falias -ffnalias -fprotect-parens -ip-no-inlining -ansi-alias -unroll=0.
One of the oddest things is that the bug occurs only if I link in 3 empty modules (with just 2 empty modules, it runs fine).
The code has main.F90, which simply calls the only function in mod_initial.f90. I've attached the entire code and Makefile, but here's the function (I've tried to simplify further, but every additional simplification I make--removing irrelevant statements, removing the double loop, reducing the loop iterations--fixes the bug):
SUBROUTINE INIT_DRIFT_MAXWELLIAN()
IMPLICIT NONE
! Input parameter
INTEGER, PARAMETER :: ND = 3
DOUBLE PRECISION, DIMENSION(1:ND) :: resRa, unifRa,irrelRa1,irrelRa2
INTEGER :: i,j
!unifRa=-135220.172189807d0
unifRa=1.d0
resRa=0.0d0
DO i=1,ND
irrelRa2=0.d0 ! moving this outside loop fixes bug
irrelRa1=0.d0 ! moving this outside loop fixes bug
DO j=1,ND-1
! Following should be equivalent to resRa(i)=resRa(i)
! (up to possible finitie precision problems).
resRa(i)=resRa(i)+unifRa(j+1)-unifRa(j)
ENDDO
ENDDO
! Have to use results somehow or compiler will just optimize everything away.
IF (0==0 .AND. resRa(1) /= 0.d0) THEN
PRINT *,'Halting.'
PRINT *, "resRa ="
PRINT *, resRa
PRINT *, "unifRa="
PRINT *, unifRa
PRINT *, "ND=", ND
PRINT *, "unifRa(1)=", unifRa(1)
PRINT *, "irrelRa2(1)=", irrelRa2(1) ! removing this fixes bug
PRINT *, "irrelRa1(1)=", irrelRa1(1) ! removing this fixes bug
PRINT *, "resRa(1)=", resRa(1)
ERROR STOP 125
ENDIF
END SUBROUTINE INIT_DRIFT_MAXWELLIAN
The array "resRa" should remain entirely zero (and nothing should be printed out): however, when the bug occurs, it yields the following output (where resRa = -1):
Halting.
resRa =
-1.00000000000000 -1.00000000000000 -1.00000000000000
unifRa=
1.00000000000000 1.00000000000000 1.00000000000000
ND= 3
unifRa(1)= 1.00000000000000
irrelRa2(1)= 0.000000000000000E+000
irrelRa1(1)= 0.000000000000000E+000
resRa(1)= -1.00000000000000
This was complied with ifort 18.0.2 on stampede2 (at TACC) using (for example) options
-g -O3 -ipo-separate -fno-inline -no-vec -no-simd -no-scalar-rep -qno-opt-assume-safe-padding -falias -ffnalias -fprotect-parens -ip-no-inlining -ansi-alias -unroll=0
and was run on a KNL node. The same bug occurs on skylake nodes of stampede2.
Thanks for any help,
Greg.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I cannot reproduce the problem on different flavors of Linux (RedHat 6, RedHat 7, Ubuntu 16). The binary does not print out anything.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for trying.
I've reproduced the problem on a system with Intel Xeon E5-2680, RHEL 7.4, ifort 17.0.4, using the same compile options as in the previous attachment, as well as this shorter list of options:
ifort -g -O3 -ipo-separate -c mod_consts.F90
ifort -g -O3 -ipo-separate -o mod_in_recon.o -c mod_in_recon.f90
ifort -g -O3 -ipo-separate -c mod_enum.f90
ifort -g -O3 -ipo-separate -c mod_initial.f90
ifort -g -O3 -ipo-separate -c main.F90
ifort -g -O3 -ipo-separate -o a.out mod_consts.o mod_in_recon.o mod_enum.o mod_initial.o main.o
~/debugIpo$ ./a.out
Halting.
resRa =
-1.00000000000000 -1.00000000000000 -1.00000000000000
unifRa=
1.00000000000000 1.00000000000000 1.00000000000000
ND= 3
unifRa(1)= 1.00000000000000
irrelRa2(1)= 0.000000000000000E+000
irrelRa1(1)= 0.000000000000000E+000
resRa(1)= -1.00000000000000
125
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The staff at NASA's Pleiades have reproduced this bug (on SLES 12) using ifort versions 15, 16, and 18.0.0.128 and 18.0.3.222. However, they also tried using ifort 19.0.3.199 (which I think isn't yet officially supported on Pleiades), and the bug does not appear for ifort 19. [ifort 19 is not available on most systems I use.] The question is now: does the program work with ifort 19 because there was a compiler bug up through v. 18 that was fixed in v. 19, or did v. 19 introduce some serendipitous change in optimization that prevents this bug from being exposed (i.e., some harmless code changes on my part could result in the re-appearance of the bug)?
For example, in the full simulation code, the bug does not appear with ifort 17 (which why we noticed it only when stampede2 upgraded to ifort 18); however, in the simplified code, the bug does appear in ifort 17.
[Pleiades staff also suggested I compile with all warnings enabled to see if there could be some subtle issue in my code: except for complaining about my "ERROR STOP 125" command, which I've now replaced with "STOP 125," ifort and gfortran issue no warnings.]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Indeed, I tested only with ifort 19.0.3.199. With both ifort 18.0.5.274 and ifort 17.0.8.262 I can reproduce the behavior. I switched on all checks with the nagfor compiler, no output, no errors, no warning, so expected behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For a work-around, try using
!DIR$ NOVECTOR
on the outer loop first, it that doesn't work, then on the inner loop also.
There appears to be no option to disable loop collapse for non-OpenMP loops (could be undocumented).
Your sample code showed small value for ND. If the actual code uses relatively small ND, lack of vectorization might not make a difference.
I do not have a system that reproduces the problem here.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! Placing
!DIR$ NOVECTOR
before the inner loop fixes the problem (whether's there's a novector statement before the outer loop has no effect on the bug) -- for ifort 18.0.3.222.
[In the full code, ND tends to be around 800.] Of course, the full code contains multiple loops pretty similar to this (though for whatever reason, just the one appears troublesome -- as far as I know).
Why did you suspect this directive would fix the problem? And do you know why the compiler options "-no-vec -no-simd" didn't have the same effect?
I'm not sure if this adds useful information, but with -qopt-report=5, ifort 18 and 19 yield identical results except for the inner loop (though all say "loop was not vectorized"):
(ifort 18 - bug)
LOOP BEGIN at mod_initial.f90(29,3)
remark #25045: Fused Loops: ( 29 30 32 )
remark #15344: loop was not vectorized: vector dependence prevents vectorization
remark #15346: vector dependence: assumed OUTPUT dependence between irrelra1(:) (30:3) and irrelra1(:) (30:3)
remark #15346: vector dependence: assumed OUTPUT dependence between irrelra1(:) (30:3) and irrelra1(:) (30:3)
LOOP END
(ifort 19 - no bug: note change from irrelra1 to irrelra2 in remark #15346)
LOOP BEGIN at mod_initial.f90(29,3)
remark #25045: Fused Loops: ( 29 30 32 )
remark #15344: loop was not vectorized: vector dependence prevents vectorization
remark #15346: vector dependence: assumed OUTPUT dependence between irrelra2(:) (29:3) and irrelra2(:) (29:3)
remark #15346: vector dependence: assumed OUTPUT dependence between irrelra2(:) (29:3) and irrelra2(:) (29:3)
LOOP END
(ifort 18 with !DIR$ NOVECTOR for inner loop - no bug)
LOOP BEGIN at mod_initial.f90(30,3)
remark #25045: Fused Loops: ( 30 32 )
remark #15319: loop was not vectorized: novector directive used
LOOP END
LOOP BEGIN at mod_initial.f90(32,3)
remark #25046: Loop lost in Fusion
LOOP END
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Why did you suspect this directive would fix the problem?
This was more of a hunch than anything else. Over 50 years of programming gives me plenty of hunches.The structure was a nested loop (DO I...DO J...) where the compiler could potentially nest-collapsed into a single loop. This has shown, in earlier versions of the compiler, to be problematic with regard to vectorization especially with the loop index being augmented with a + or - offset.
Note the opt-report states "Fused Loops" which is (grammatically) incorrect. Either this is a mis-statement, or the compiler is in the wrong section of code.
>> do you know why the compiler options "-no-vec -no-simd" didn't have the same effect?
I do not think it specifically has to do with vectorization, but rather a case of the loop collapsing. The compiler does not have a directive to instruct NOCOLLAPSE and the nearest thing to accomplish this was to insert the NOVECTOR.
BTW in cases like this, especially where the inner loop count is ~800, a different workaround strategy is to export the code to a Callable subroutine. And if necessary don't IPO it.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page