- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a code snippet I compiled with Intel Fortran 12.0.4.196, banner is:
Intel Visual Fortran Intel 64 Compiler XE for applications running on Inte
l 64, Version 12.0.4.196 Build 20110427
Copyright (C) 1985-2011 Intel Corporation. All rights reserved.
and compilation line is:
ifort /nologo /f77rtl /Qsave /Qzero /O2 /assume:nodummy_aliases /MT
The code snippet is (good old F77...):
IF( MODE .EQ. 2 )THEN
C Face on element
DO 100 L=1,3
A(L) = RFNODE(L,IFELEM(5,FACE))
B(L) = RFNODE(L,IFELEM(6,FACE))
C(L) = RFNODE(L,IFELEM(7,FACE))
IF( IFELEM(2,FACE) .EQ. 115 .OR.
+ IFELEM(2,FACE) .EQ. 116 .OR.
+ IFELEM(2,FACE) .EQ. 119 .OR.
+ IFELEM(2,FACE) .EQ. 120 )THEN
POINT(L) = RFNODE(L,IFELEM(9,FACE))
ELSE
POINT(L) = RFNODE(L,IFELEM(8,FACE))
ENDIF
100 CONTINUE
ELSE
INDX = 9
IF( IFACE(9,FACE).EQ.0 ) INDX = 8
DO 110 L=1,3
A(L) = RFNODE( L, IFACE(6,FACE) )
B(L) = RFNODE( L, IFACE(7,FACE) )
C(L) = RFNODE( L,IFACE(INDX,FACE) )
110 CONTINUE
ENDIF
In the above case, MODE is 2, MODE and FACE are received as arguments. Obviously, the IF in the loop 100 can be placed out of the loop. Alas, this code crashes when compiled with optimization. Here is the assembly generated, which I hope is around the test for mode .eq. 2. (the jne instruction is just after a comparison with 2)
000000013F6C1FE8 jne 000000013F6C21D7
000000013F6C1FEE movdqa xmm3,xmmword ptr [13FA47E30h]
000000013F6C1FF6 lea r10,[rax*8]
000000013F6C1FFE sub r10,rax
000000013F6C2001 shl r10,4
000000013F6C2005 movdqa xmm2,xmmword ptr [13FA47E40h]
000000013F6C200D movsxd rdx,dword ptr [r10+r9-58h]
000000013F6C2012 imul rdx,rdx,9Ch
000000013F6C2019 movsxd rcx,dword ptr [r10+r9-5Ch]
000000013F6C201E imul rcx,rcx,9Ch
000000013F6C2025 cvtps2pd xmm4,mmword ptr [rdx+r14-9Ch]
000000013F6C202E cvtps2pd xmm15,mmword ptr [rcx+r14-9Ch]
000000013F6C2037 movsxd rbx,dword ptr [r10+r9-60h]
000000013F6C203C imul rbx,rbx,9Ch
000000013F6C2043 movsxd rax,dword ptr [r10+r9-50h]
000000013F6C2048 imul rax,rax,9Ch
000000013F6C204F cvtps2pd xmm1,mmword ptr [rbx+r14-9Ch]
000000013F6C2058 mov r8d,dword ptr [r10+r9-6Ch]
000000013F6C205D movaps xmmword ptr [13FB29530h],xmm4
000000013F6C2064 movdqa xmm4,xmmword ptr [13FA47E20h]
000000013F6C206C movd xmm0,r8d
000000013F6C2071 pshufd xmm5,xmm0,0
000000013F6C2076 movdqa xmm0,xmmword ptr [13FA47E50h]
000000013F6C207E pcmpeqd xmm4,xmm5
000000013F6C2082 pcmpeqd xmm3,xmm5
000000013F6C2086 pcmpeqd xmm2,xmm5
000000013F6C208A pcmpeqd xmm5,xmm0
000000013F6C208E movdqa xmm0,xmm4
000000013F6C2092 movaps xmmword ptr [13FB29510h],xmm15
000000013F6C209A movdqa xmm15,xmm3
000000013F6C209F punpckldq xmm0,xmm4
000000013F6C20A3 orps xmm4,xmm3
000000013F6C20A6 movaps xmmword ptr [13FB294F0h],xmm1
000000013F6C20AD orps xmm4,xmm2
000000013F6C20B0 punpckldq xmm15,xmm3
000000013F6C20B5 orps xmm4,xmm5
000000013F6C20B8 cvtps2pd xmm1,mmword ptr [rax+r14-9Ch]
It crashes at the last line, rax is zero and it should not be, anyway, it crashes with an address violation.
If I change the code for this:
IF( MODE .EQ. 2 )THEN
C Face on element
IF( IFELEM(2,FACE) .EQ. 115 .OR.
+ IFELEM(2,FACE) .EQ. 116 .OR.
+ IFELEM(2,FACE) .EQ. 119 .OR.
+ IFELEM(2,FACE) .EQ. 120 )THEN
INDX = 9
ELSE
INDX = 8
ENDIF
DO 100 L=1,3
A(L) = RFNODE(L,IFELEM(5,FACE))
B(L) = RFNODE(L,IFELEM(6,FACE))
C(L) = RFNODE(L,IFELEM(7,FACE))
POINT(L) = RFNODE(L,IFELEM(INDX,FACE))
100 CONTINUE
ELSE
INDX = 9
IF( IFACE(9,FACE).EQ.0 ) INDX = 8
DO 110 L=1,3
A(L) = RFNODE( L, IFACE(6,FACE) )
B(L) = RFNODE( L, IFACE(7,FACE) )
C(L) = RFNODE( L,IFACE(INDX,FACE) )
110 CONTINUE
ENDIF
Everything works fine.
I am really concerned about this since there are a lot of places in our code where this optimization can take place and we have millions of lines of code. And (obviously) this only happens in optimized mode, so it is a pain to debug. Should we upgrade to some more recent version of the compiler. Has this problem been addressed? Is there any more info I can provide?
This code used to compile ok in version 9.1, with /optimize:2. This code also compiles ok with optimization level /O1, which is equivalent to /optimize:2 when looking at the deprecation help of version 12.0. So for now, I will stick to /O1...
Thanks for any hint,
Etienne Monette
Intel Visual Fortran Intel 64 Compiler XE for applications running on Inte
l 64, Version 12.0.4.196 Build 20110427
Copyright (C) 1985-2011 Intel Corporation. All rights reserved.
and compilation line is:
ifort /nologo /f77rtl /Qsave /Qzero /O2 /assume:nodummy_aliases /MT
The code snippet is (good old F77...):
IF( MODE .EQ. 2 )THEN
C Face on element
DO 100 L=1,3
A(L) = RFNODE(L,IFELEM(5,FACE))
B(L) = RFNODE(L,IFELEM(6,FACE))
C(L) = RFNODE(L,IFELEM(7,FACE))
IF( IFELEM(2,FACE) .EQ. 115 .OR.
+ IFELEM(2,FACE) .EQ. 116 .OR.
+ IFELEM(2,FACE) .EQ. 119 .OR.
+ IFELEM(2,FACE) .EQ. 120 )THEN
POINT(L) = RFNODE(L,IFELEM(9,FACE))
ELSE
POINT(L) = RFNODE(L,IFELEM(8,FACE))
ENDIF
100 CONTINUE
ELSE
INDX = 9
IF( IFACE(9,FACE).EQ.0 ) INDX = 8
DO 110 L=1,3
A(L) = RFNODE( L, IFACE(6,FACE) )
B(L) = RFNODE( L, IFACE(7,FACE) )
C(L) = RFNODE( L,IFACE(INDX,FACE) )
110 CONTINUE
ENDIF
In the above case, MODE is 2, MODE and FACE are received as arguments. Obviously, the IF in the loop 100 can be placed out of the loop. Alas, this code crashes when compiled with optimization. Here is the assembly generated, which I hope is around the test for mode .eq. 2. (the jne instruction is just after a comparison with 2)
000000013F6C1FE8 jne 000000013F6C21D7
000000013F6C1FEE movdqa xmm3,xmmword ptr [13FA47E30h]
000000013F6C1FF6 lea r10,[rax*8]
000000013F6C1FFE sub r10,rax
000000013F6C2001 shl r10,4
000000013F6C2005 movdqa xmm2,xmmword ptr [13FA47E40h]
000000013F6C200D movsxd rdx,dword ptr [r10+r9-58h]
000000013F6C2012 imul rdx,rdx,9Ch
000000013F6C2019 movsxd rcx,dword ptr [r10+r9-5Ch]
000000013F6C201E imul rcx,rcx,9Ch
000000013F6C2025 cvtps2pd xmm4,mmword ptr [rdx+r14-9Ch]
000000013F6C202E cvtps2pd xmm15,mmword ptr [rcx+r14-9Ch]
000000013F6C2037 movsxd rbx,dword ptr [r10+r9-60h]
000000013F6C203C imul rbx,rbx,9Ch
000000013F6C2043 movsxd rax,dword ptr [r10+r9-50h]
000000013F6C2048 imul rax,rax,9Ch
000000013F6C204F cvtps2pd xmm1,mmword ptr [rbx+r14-9Ch]
000000013F6C2058 mov r8d,dword ptr [r10+r9-6Ch]
000000013F6C205D movaps xmmword ptr [13FB29530h],xmm4
000000013F6C2064 movdqa xmm4,xmmword ptr [13FA47E20h]
000000013F6C206C movd xmm0,r8d
000000013F6C2071 pshufd xmm5,xmm0,0
000000013F6C2076 movdqa xmm0,xmmword ptr [13FA47E50h]
000000013F6C207E pcmpeqd xmm4,xmm5
000000013F6C2082 pcmpeqd xmm3,xmm5
000000013F6C2086 pcmpeqd xmm2,xmm5
000000013F6C208A pcmpeqd xmm5,xmm0
000000013F6C208E movdqa xmm0,xmm4
000000013F6C2092 movaps xmmword ptr [13FB29510h],xmm15
000000013F6C209A movdqa xmm15,xmm3
000000013F6C209F punpckldq xmm0,xmm4
000000013F6C20A3 orps xmm4,xmm3
000000013F6C20A6 movaps xmmword ptr [13FB294F0h],xmm1
000000013F6C20AD orps xmm4,xmm2
000000013F6C20B0 punpckldq xmm15,xmm3
000000013F6C20B5 orps xmm4,xmm5
000000013F6C20B8 cvtps2pd xmm1,mmword ptr [rax+r14-9Ch]
It crashes at the last line, rax is zero and it should not be, anyway, it crashes with an address violation.
If I change the code for this:
IF( MODE .EQ. 2 )THEN
C Face on element
IF( IFELEM(2,FACE) .EQ. 115 .OR.
+ IFELEM(2,FACE) .EQ. 116 .OR.
+ IFELEM(2,FACE) .EQ. 119 .OR.
+ IFELEM(2,FACE) .EQ. 120 )THEN
INDX = 9
ELSE
INDX = 8
ENDIF
DO 100 L=1,3
A(L) = RFNODE(L,IFELEM(5,FACE))
B(L) = RFNODE(L,IFELEM(6,FACE))
C(L) = RFNODE(L,IFELEM(7,FACE))
POINT(L) = RFNODE(L,IFELEM(INDX,FACE))
100 CONTINUE
ELSE
INDX = 9
IF( IFACE(9,FACE).EQ.0 ) INDX = 8
DO 110 L=1,3
A(L) = RFNODE( L, IFACE(6,FACE) )
B(L) = RFNODE( L, IFACE(7,FACE) )
C(L) = RFNODE( L,IFACE(INDX,FACE) )
110 CONTINUE
ENDIF
Everything works fine.
I am really concerned about this since there are a lot of places in our code where this optimization can take place and we have millions of lines of code. And (obviously) this only happens in optimized mode, so it is a pain to debug. Should we upgrade to some more recent version of the compiler. Has this problem been addressed? Is there any more info I can provide?
This code used to compile ok in version 9.1, with /optimize:2. This code also compiles ok with optimization level /O1, which is equivalent to /optimize:2 when looking at the deprecation help of version 12.0. So for now, I will stick to /O1...
Thanks for any hint,
Etienne Monette
Link Copied
19 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please attach a complete, compilable source that demonstrates the problem. We cannot investigate based on a snippet.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, here we go. The zip file contains everything needed to compile. To compile:
ifort /nologo /f77rtl /Qsave /Qzero /O2 /assume:nodummy_aliases /MT /c /Fosplit9.obj split9.f
link /nologo /MANIFEST /OUT:optbug.exe z8d.lib tmg.lib apcfg.lib CharFortranC.lib wizintf.lib z8d.lib esc.lib nx2tmg.lib emc.lib octree2.lib xmlwrapper.lib expat.lib version.lib MayaSecurityValidator.lib tomcrypt.lib nx2tmg_main.obj split9.obj
Execute this line to obtain the crash:
optbug.exe -s NXTMG68c-Solution_1.xml
Sorry about the many libraries, I tried to isolate the code, but it wasn't crashing. The code should execute until you see the line performing mesh check and then crash.
By the way, this is on Windows 7, 64 bits.
Etienne
ifort /nologo /f77rtl /Qsave /Qzero /O2 /assume:nodummy_aliases /MT /c /Fosplit9.obj split9.f
link /nologo /MANIFEST /OUT:optbug.exe z8d.lib tmg.lib apcfg.lib CharFortranC.lib wizintf.lib z8d.lib esc.lib nx2tmg.lib emc.lib octree2.lib xmlwrapper.lib expat.lib version.lib MayaSecurityValidator.lib tomcrypt.lib nx2tmg_main.obj split9.obj
Execute this line to obtain the crash:
optbug.exe -s NXTMG68c-Solution_1.xml
Sorry about the many libraries, I tried to isolate the code, but it wasn't crashing. The code should execute until you see the line performing mesh check and then crash.
By the way, this is on Windows 7, 64 bits.
Etienne
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, no crash! Last lines of output:
[bash] Sub-domain Velocity Length Reynolds Mach ---------------------------------------------------------------------- 1- AIR 0.0000E+00 mm/s 9.9477E+01 mm 0.0000E+00 0.0000E+00 ---------------------------------------------------------------------- Writing flow model files... ...done. Writing thermal model files... ...done. _#I NX2TMG 2 87E294278CCC534A9F621143DC626BC025A5BC6D 7B457DABB47D9AEC1546B776292BD977B10E08CE$ _#I NX2TMG 2 E3683141BBBB2FBEFC84170AF3A438FFBE0FBE43 924B20CFEF0E13ACC505D521B3D3FDCBA5D2CE74$ _#I NX2TMG 2 45C7F4A71927CA734757638CC2B7886BEECF7AA6 1214CF71FDCFDFC3C174FE8E441866751F94D7B2$ _#I NX2TMG 2 BA945355A611941156BEBE98A8F1B7B7E6C317A1 CDAEBA6676B6A23E3B87D716916A8CCC5582AD46$ _#I NX2TMG 2 54C2885CA023CA307A57E8F0FCAB32367382CEAC CD80DE2F684F767874FBAF11C5DD2E9C413F7C5F$ _#I NX2TMG 2 B97329E1436D76025DF26BB4BF9FE3C0FE36851D 2923DBB2CB707BFEA8B61C241262EED30014C6E6$[/bash]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Indeed, it didn't crash on your computer. So I tried it out around here. It crashes on my computer (Windows 7), and on an XP 64 computer. But it passes on another Windows 7 computer. So we are even at 2 crashes and 2 passes.
Can you try it out on more than one computer. In order to crash, you have to access an out of bound memory address, and this looks to be computer specific. When it crashes on my computer, it does on that assembly line:
000000013F6C20B8 cvtps2pd xmm1,mmword ptr [rax+r14-9Ch]
From what I understand (my x64 assembly is not that good!) rax represents an offset in the RFNODE array and so does rbx:
000000013F6C2037 movsxd rbx,dword ptr [r10+r9-60h]
000000013F6C203C imul rbx,rbx,9Ch
000000013F6C2043 movsxd rax,dword ptr [r10+r9-50h]
000000013F6C2048 imul rax,rax,9Ch
When it crashes, rax is 0 and rbx is 156, which represents the index 1 in the RFNODE array, which is 39*4=156 wide.
I added these print statement, giving respectively MODE,IFELEM(2,FACE) and IFELEM(5..9,FACE)
Mode: 2
Type: 111
Node 1: 1
Node 2: 2
Node 3: 3
Node 4: 4
Node 5: 0
As you see, only IFELEM(9,FACE) is zero, but since the type is 111, it is IFELEM(8,FACE) which should be used. The assembly line where it crashes represents dereferencing a Fortran array at index 0 and this does not always crash.
It looks to me like the optimizer tried to remove the IF statement out of LOOP 100, but it did not do it correctly...
Etienne
Can you try it out on more than one computer. In order to crash, you have to access an out of bound memory address, and this looks to be computer specific. When it crashes on my computer, it does on that assembly line:
000000013F6C20B8 cvtps2pd xmm1,mmword ptr [rax+r14-9Ch]
From what I understand (my x64 assembly is not that good!) rax represents an offset in the RFNODE array and so does rbx:
000000013F6C2037 movsxd rbx,dword ptr [r10+r9-60h]
000000013F6C203C imul rbx,rbx,9Ch
000000013F6C2043 movsxd rax,dword ptr [r10+r9-50h]
000000013F6C2048 imul rax,rax,9Ch
When it crashes, rax is 0 and rbx is 156, which represents the index 1 in the RFNODE array, which is 39*4=156 wide.
I added these print statement, giving respectively MODE,IFELEM(2,FACE) and IFELEM(5..9,FACE)
Mode: 2
Type: 111
Node 1: 1
Node 2: 2
Node 3: 3
Node 4: 4
Node 5: 0
As you see, only IFELEM(9,FACE) is zero, but since the type is 111, it is IFELEM(8,FACE) which should be used. The assembly line where it crashes represents dereferencing a Fortran array at index 0 and this does not always crash.
It looks to me like the optimizer tried to remove the IF statement out of LOOP 100, but it did not do it correctly...
Etienne
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried it on Windows 7 x64. No errors with either 12.0.4 (which you used) or 12.1.0 (current version). I even took the EXE you had prebuilt in the ZIP and ran it as-is - no crash.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried the prebuilt executable in the zip file on 3 computers here, and it crashes on 2 out of 3...
As I explained, it dereferences an array at index 0 which means it will only crash if the array is on the limit of the legal memory space for the application.
I already have a solution for this which is to use /O1 optimization level, but I really feel uneasy about this since /O2 is considered the default optimization level.
I really hope this not to be a compiler problem, but from what I saw in the assembly and from the tests I did, everything points towards that direction. And I am aware 99.9% of the bugs are programmers one. But this happens in such a simple routine it makes the assembly code quite easy to follow. And what I see in the assembly is wrong.
I am willing to work out something to enable a remote connection to my computer if this might help.
Etienne
As I explained, it dereferences an array at index 0 which means it will only crash if the array is on the limit of the legal memory space for the application.
I already have a solution for this which is to use /O1 optimization level, but I really feel uneasy about this since /O2 is considered the default optimization level.
I really hope this not to be a compiler problem, but from what I saw in the assembly and from the tests I did, everything points towards that direction. And I am aware 99.9% of the bugs are programmers one. But this happens in such a simple routine it makes the assembly code quite easy to follow. And what I see in the assembly is wrong.
I am willing to work out something to enable a remote connection to my computer if this might help.
Etienne
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I enabled bounds checking and it saw no issues. You may have some background program on your computers that is corrupting things.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Should have thought of it before. I made the value 0 a negative value close to minimum 32 bits integer. So now, it is way out of bounds.
See attached file, which includes the aggressive_optbug.exe and the corresponding nx2tmg.lib files.
This one did crash on the computer previously not crashing. It should fail on any computer now.
And again, everything is fine with /O1 optimization, the problem only occurs with /O2.
Etienne
See attached file, which includes the aggressive_optbug.exe and the corresponding nx2tmg.lib files.
This one did crash on the computer previously not crashing. It should fail on any computer now.
And again, everything is fine with /O1 optimization, the problem only occurs with /O2.
Etienne
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried something different. On a W7X64 system, on which your EXE (from post #2) ran to completion without errors, I ran your EXE under the VS2010 debugger. This time, the program crashed as you reported, with %rax = 0. What this suggests is either (i) an uninitialized variable or, less likely, (ii) optimizer bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With the aggressive version, you will get %rax to be very negative... What I did was set IFELEM(9,1) to -2000000000 just before calling the SPLIT9 sub and set it back to 0 just after. In the case we are interested in, IFELEM(9,1) should not be used, but it is.
Etienne
Etienne
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a suggestion to probe your suspicions about the optimizer.
The number of arguments to the subroutine is fairly small, and the sizes of the arguments are quite modest. Make up a small test program which sets all the subroutine arguments to what they should be (using initialization statements, or reading from a text file) and calls the subroutine.
The test program can be built without using any of your libraries. This test problem can then be run with various compiler options and tested.
The number of arguments to the subroutine is fairly small, and the sizes of the arguments are quite modest. Make up a small test program which sets all the subroutine arguments to what they should be (using initialization statements, or reading from a text file) and calls the subroutine.
The test program can be built without using any of your libraries. This test problem can then be run with various compiler options and tested.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I already tried that without success, which is why I sent out all the libs. But I will try this out again with the aggressive approach.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finally nailed it. Attached are two files, cannot get simpler than this. Here is how I compiled:
ifort /nologo /f77rtl /Qsave /Qzero /O2 /assume:nodummy_aliases /MT /c /Fooptbug.obj optbug.f
ifort /nologo /f77rtl /Qsave /Qzero /O2 /assume:nodummy_aliases /MT /c /Fosplit9.obj split9.f
link /nologo /MANIFEST /OUT:optbug.exe optbug.obj split9.obj
Plain fortran, plain simple.
Again, no problems with /O1.
Etienne
P.S.
Went a step further:
ifort /nologo /O2 /c /Fooptbug.obj optbug.f
ifort /nologo /O2 /c /Fosplit9.obj split9.f
link /nologo /MANIFEST /OUT:optbug.exe optbug.obj split9.obj
Will crash as soon as optimization level is 2 or 3.
[Also removed dependencies to .inc files in split9.f]
ifort /nologo /f77rtl /Qsave /Qzero /O2 /assume:nodummy_aliases /MT /c /Fooptbug.obj optbug.f
ifort /nologo /f77rtl /Qsave /Qzero /O2 /assume:nodummy_aliases /MT /c /Fosplit9.obj split9.f
link /nologo /MANIFEST /OUT:optbug.exe optbug.obj split9.obj
Plain fortran, plain simple.
Again, no problems with /O1.
Etienne
P.S.
Went a step further:
ifort /nologo /O2 /c /Fooptbug.obj optbug.f
ifort /nologo /O2 /c /Fosplit9.obj split9.f
link /nologo /MANIFEST /OUT:optbug.exe optbug.obj split9.obj
Will crash as soon as optimization level is 2 or 3.
[Also removed dependencies to .inc files in split9.f]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am not sure what this cut-down example accomplishes.
It shows that if a subscript used is out of its bounds, the program may or may not crash depending on the optimization level and on the contents of uninitialized memory.
I don't think that this establishes anything regarding optimizer bugs.
I still suspect subscript bounds errors and uninitialized variables in your large application.
It shows that if a subscript used is out of its bounds, the program may or may not crash depending on the optimization level and on the contents of uninitialized memory.
I don't think that this establishes anything regarding optimizer bugs.
I still suspect subscript bounds errors and uninitialized variables in your large application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The cut down example accomplishes this:
Since IFELEM(2,1) equals 111, the IF statement in LOOP 100 of split9 is always false.
Then, POINT(L) is always equal to RFNODE(L,IFELEM(8,FACE)).
But because of optimizations, the value RFNODE(L,IFELEM(9,FACE)) is prefetched and I made it out of bounds on purpose. I cannot guarantee IFELEM(9,FACE) to be a valid index for a fortran array at all times. Nobody can. Even if it was initialized to zero, which is good practice, this code could fail as zero is not a valid index in Fortran. And this is exactly what is happening in the bigger application (it crashes my computer, but not yours).
The cut down example clearly shows an illegal (or dangerous) optimization.
If you debug the application, you will see line 31 of split.f is never executed. But the optimized code is executing part of it, I do not know exactly why, but I suspect prefetching a value to optimize a branch.
If optimization level /O2 means enabling prefetching of any value in the code, then for sure, our code must avoid this at all costs. But the help of ifort command tags /O2 as the default optimization level, and if this means that you have to make sure both sides of any branch must have fetchable values... Well, this is too big a requisite to me. I will stick with level /O1.
Etienne
Since IFELEM(2,1) equals 111, the IF statement in LOOP 100 of split9 is always false.
Then, POINT(L) is always equal to RFNODE(L,IFELEM(8,FACE)).
But because of optimizations, the value RFNODE(L,IFELEM(9,FACE)) is prefetched and I made it out of bounds on purpose. I cannot guarantee IFELEM(9,FACE) to be a valid index for a fortran array at all times. Nobody can. Even if it was initialized to zero, which is good practice, this code could fail as zero is not a valid index in Fortran. And this is exactly what is happening in the bigger application (it crashes my computer, but not yours).
The cut down example clearly shows an illegal (or dangerous) optimization.
If you debug the application, you will see line 31 of split.f is never executed. But the optimized code is executing part of it, I do not know exactly why, but I suspect prefetching a value to optimize a branch.
If optimization level /O2 means enabling prefetching of any value in the code, then for sure, our code must avoid this at all costs. But the help of ifort command tags /O2 as the default optimization level, and if this means that you have to make sure both sides of any branch must have fetchable values... Well, this is too big a requisite to me. I will stick with level /O1.
Etienne
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please try Update 6. I can reproduce the access violation with Update 4 but not with Update 6.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will, in the mean time, I really reduced the problem to its simplest expression, see attached file.
I compiled it on Linux, with Portland Group compiler, fully optimized to level 3, no problem.
Here is how I compiled:
ifort /nologo /O2 /c /Fooptbug.obj optbug.f
ifort /nologo /O2 /c /Fomysub.obj mysub.f
link /nologo /MANIFEST /OUT:optbug.exe optbug.obj mysub.obj
Etienne
I compiled it on Linux, with Portland Group compiler, fully optimized to level 3, no problem.
Here is how I compiled:
ifort /nologo /O2 /c /Fooptbug.obj optbug.f
ifort /nologo /O2 /c /Fomysub.obj mysub.f
link /nologo /MANIFEST /OUT:optbug.exe optbug.obj mysub.obj
Etienne
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not have 12.0.4 installed anymore. Both 12.0.5 and 12.1, given your mysub.f:
Furthermore, if you comment out the PRINT statement in the subroutine, the assembly output is trivial:
[fortran] SUBROUTINE MYSUB(TABLE,TESTME) C IMPLICIT NONE INTEGER TESTME(*) REAL*8 TABLE(2,*) REAL*8 SETME(2) INTEGER L C PRINT *, TESTME(1) DO L=1,2 IF(TESTME(1) .EQ. 1 .OR. TESTME(1) .EQ. 2) THEN C Since TESTME(1) is 0, we will never get in here... SETME(L) = TABLE(L,TESTME(2)) ENDIF ENDDO C RETURN END[/fortran]will not produce any code for the DO loop if any optimization level other than 0 is enabled; this is not because of the value of TESTME(1) being 1 as you claim, since this value is not known at compile time, but because the sole effect of the loop is to assign values to a local variable which is not used elsewhere in the subroutine. Indeed, the assembly output is
[bash] PUBLIC MYSUB MYSUB PROC ; parameter 1: rcx ; parameter 2: rdx sub rsp, 104 mov r8, 0208384ff00H mov eax, DWORD PTR [rdx] lea rcx, QWORD PTR [48+rsp] mov edx, -1 lea r9, QWORD PTR [_2_STRLITPACK_0.0.1] mov QWORD PTR [48+rsp], 0 lea r10, QWORD PTR [96+rsp] mov DWORD PTR [96+rsp], eax mov QWORD PTR [32+rsp], r10 call for_write_seq_lis add rsp, 104 ret [/bash]Note that the fourth instruction overwrites the address of TABLE. You can set TESTME(1) = 77 in the main program, and there will be no effect on the code produced for the subroutine (assuming IPO is not being used).
Furthermore, if you comment out the PRINT statement in the subroutine, the assembly output is trivial:
[bash]MYSUB PROC ; parameter 1: rcx ; parameter 2: rdx ret [/bash]Again, note that this collapsing of the code is because of optimization of the subroutine alone, since the compiler has not even seen the source code of the main program, where you set TESTME(1) = 1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Update 6 fixed the problem...
I will recompile and retest my code with /O2.
Etienne
P.S.
Done with recompilation and testing, I no longer experience crashes. The execution is faster now (we used to optimize with /O1). So far, so good.
Thanks for your help, and I hope Update 6 will be the good one...
I will recompile and retest my code with /O2.
Etienne
P.S.
Done with recompilation and testing, I no longer experience crashes. The execution is faster now (we used to optimize with /O1). So far, so good.
Thanks for your help, and I hope Update 6 will be the good one...

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page