Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29278 Discussions

executable crashes / stops without error

mr_katse
Beginner
8,276 Views
hi!
i am using a fortran-program which rewrites many files (in sum about 26000 files) into 1 file.
the input-files are opened one by one and closed after reading.
in ivf i have the issue, that the executable just stops without an error after opening/reading/closing about 16000 files. at the moment my workaround is to compile my code in compaq visual fortran. here i dont have any issues.
what can i do to compile the code in ivf?
thank you!
0 Kudos
45 Replies
jimdempseyatthecove
Honored Contributor III
3,212 Views
Are you on a workstation or server?
If server, does the system have a system policy to detect and kill a runaway program.

Users on Linux get a nasty surprise is OOM_Killer is running on the system and decides to kill your application.

Other than trying to duplicate your situation we are running out of options here.

Last thing to try this: Create a batch file (CMD script)

: foo.bat
yourProgramHere yourArgsHere
IF ERRORLEVEL 1 ECHO Error level of 1 or greater %ERRORLEVEL%

Then run the batch

If you see the error message something is causing your program to exit abnormaly.
The error level value may or may not print out depending on CMD option /E:ON

Jim
0 Kudos
mr_katse
Beginner
3,212 Views
i am working on a workstation with w7. i cant imagine, that an application is killing my fortran-exe, because the exe compiled with CVF works.

created the batch file: the executable still crashes and i get the error level -1073741819.

i have attached 3 files:
cdr_output_to_timeseries_modMathew.f
list.txt
output_200701010100.txt

list.txt should be in the root-folder
output_200701010100.txt should be in \input
a folder \output is also needed.

maybe you can duplicate my situation.

thank you!
0 Kudos
jimdempseyatthecove
Honored Contributor III
3,212 Views
Running test program now. Using IFV 11.0.66 on WinXP Pro x64 in 64-bit Debug build. With full debug checking this will take a while to reach problem point, ~5 ifiles per second. At 1700 now.

Jim
0 Kudos
jimdempseyatthecove
Honored Contributor III
3,212 Views
>>created the batch file: the executable still crashes and i get the error level -1073741819

This is C0000005 STATUS_ACCESS_VIOLATION

In this situation it would mean file access violation.

Let's see if it occures on my system too

Jim
at 9000 now
0 Kudos
mr_katse
Beginner
3,212 Views
i run the program on xp 32 bit and w7 64 bit.

my about dialog says i using compiler:

Intel Visual Fortran Compiler Integration Package ID: w_cprof_p_11.1.054
Intel Visual Fortran Compiler Integration for Microsoft Visual Studio 2008, 11.1.3469.2008

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,212 Views
Mathew,

I can reproduce the problem here. Crashes at file 16,150 with 0xC0000005

RSP = 0x30FF0

Which is just below the top of an unmapped page. IOW the stack local variables of the current stack frame. Then call fails due to write to non-existant memory for the return address. As to how it got into this situation??

a) something modified stack frame pointer on stack
b) something caused "infinate" recursion (error in error recovery/reporting routine)

So the error occures in both 32-bit and 64-bit application on Windows XP x64 and Windows 7 x64 using IVF 11.0.066 and IVF 11.1.054

Premier suppor should be able to reproduce this problem given your test program and test files.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
3,212 Views
I'll take a look.
0 Kudos
mecej4
Honored Contributor III
3,212 Views
Here is a shorter program based on the posted program; this also shows the same buggy behavior. The shorter program also runs about 100 files/second on a 2 GHz Athlon-X2, using Intel 11.1.067 32 or 64 bit compilers. The same input/output files are used as given by MR-KATSE. The errors do not occur on Linux using the Intel compiler 11.1.073.

[fxfortran]      program cdr_zoneoutput_converter

integer MAXCOL, MAXROW, MAXFILE, MAXDAY
parameter (MAXCOL=30)
parameter (MAXROW=1000)
parameter (MAXFILE=50)
parameter (MAXDAY = 30000)
parameter (IRL=10400)

character FILENAME*(MAXFILE)(MAXDAY)
character*12 onames(23)
data onames/'BFZON','BWOZON','BW3ZON','DELTAZON','ETATZON',
1 'ETAP0ZON','ETPEZON','ETPRZON','MELTZON',
2 'PRAINSOILZON','PSNOWZON','PZON','QAB1ZON',
3 'QAB2ZON','QAB3ZON','QABZON','QEX2ZON','QVS0ZON',
4 'SCOVZON','SMELTZON','SWWZON','TOTALSZON','TZON'/

integer NCOL, NROW, ICOL, IROW, IFILE, NFILE, IER
character VALUES(MAXCOL,MAXROW)*13
integer tval(8)

do iunit=11,33
open(unit=iunit,
1 file='output/' // trim(onames(iunit-10)) // '.txt',
2 ACCESS='sequential', RECL=IRL)
end do

NROW = 791
Print*, 'NROW = ', NROW

NFILE=0
NCOL=24

do 900 IFILE=1, MAXDAY

c ** read FILENAMES **
open (120, file='list.txt', status='old')
NFILE=NFILE+1
read (120, fmt=*, end=100) FILENAME(IFILE)

900 enddo

100 continue
close (120)
nfile=nfile-1
write(*,*)' nFile = ',nfile

c*** read in cdr-outputfile ***
do 1000 IFILE=1, NFILE
if(mod(ifile,100).eq.0)then
write(*,*) 'IFILE ', IFILE
endif
open (unit=98,file='input/'//FILENAME(IFILE),
+ status='old', IOSTAT=IER,err=300)
if(ier.ne.0) then
write(*,*)' Ifile, IOSTAT ',ifile,ier
endif
read(98,*,iostat=ier,err=399)

do 200, IROW=1, NROW
read (98,*, err=301) (VALUES(ICOL,IROW), ICOL=1, NCOL)
200 continue

close (98)
c*** write output_files ***
DO ICOL=2,24
write (icol+9, fmt='(791A)')
+ (VALUES (ICOL,IROW), IROW=1,NROW) !Format repeat count=NROW value
end do
1000 enddo

do ICOL=2,24
close (icol+9)
end do

print*,'finished normally'
stop

300 Print*, 'ERROR opening ', FILENAME(IFILE)
goto 302
301 Print*, 'ERROR reading ', FILENAME(IFILE)
Print*, 'Aborted... '
399 Print*, 'IOSTAT - read =', ier
302 stop
end
[/fxfortran]
0 Kudos
Steven_L_Intel1
Employee
3,212 Views
The problem is caused by this line:

[fxfortran]      open (unit=98,file='input/'//FILENAME(IFILE),status='old', 
     + IOSTAT=IER,err=300)[/fxfortran]
The compiler creates a stack temporary for the file= expression but does not remove it from the stack. After a long while, the stack is exhausted but apparently this is not detected with a normal stack overflow message.

A workaround is to assign the value 'input/'//FILENAME(IFILE) to a character variable and then pass the variable as the file= value. The program seems to work on Linux because the default stacksize is larger there.

I will report this to the developers. Issue ID is DPD200161714.

Here is a simple (and quicker) reproducer.

[fortran]character(1000) padding
do i=1,2000
write (padding,'(I5.5,A)') i,'.txt'
open (unit=1,file='input'//padding,disp='delete')
close (1)
end do

end[/fortran]
0 Kudos
mecej4
Honored Contributor III
3,212 Views
Steve: wow, a 7-line reproducer? That really captures the essence!
0 Kudos
jimdempseyatthecove
Honored Contributor III
3,212 Views
Gooddiagnosis Steve.
I guess that this same bug would be present with any statement that would generate a temp for dummy arg. Such as function or subroutine call with concatinated argument.

Would you know if

MyFile = 'input'//something

generates a temp or concatinates directly into MyFile?

Jim
0 Kudos
Steven_L_Intel1
Employee
3,212 Views
You just have to know what you're looking at. I've been doing this a long time...

Jim, the compiler pops temps off the stack in many cases, including assignment. Most of the time. The way the compiler works is that it looks for specific cases to do this, as usually it's not worth the bother - the stack will get popped when the routine exits. But I've seen a fair number of cases like this one where there's a large loop that builds up stack and eventually blows.
0 Kudos
mr_katse
Beginner
3,212 Views
many thanks for the help and the workaround!

tested it and it works fine (at least as long the file names have the same length, or?)

0 Kudos
Steven_L_Intel1
Employee
3,212 Views
As long as the variable you pick is longer than the longest possible filename, it will be fine. You can probably make it shorter than I had it.
0 Kudos
jimdempseyatthecove
Honored Contributor III
3,212 Views
You just have to know what you're looking at. I've been doing this a long time...

Jim, the compiler pops temps off the stack in many cases, including assignment. Most of the time. The way the compiler works is that it looks for specific cases to do this, as usually it's not worth the bother - the stack will get popped when the routine exits. But I've seen a fair number of cases like this one where there's a large loop that builds up stack and eventually blows.


So this is an old problem that resurfaces from time to time. Time to get it fixed.

If you are looking to reduce the number of stack cleanups of these temps then consider having the compiler determine if a loop contains the creation of these temps then if so, create a hidden local save stack pointer variable for that loop, copy stack pointer to this variable immediately prior to loop start, then at front ofbody of loop restore stack pointer prior to first statement in scope of loop. Loops without such temporaries will not incure this additional overhead. Also, if the number of iterations of the loop is known to be small then you could bypass the save/restore code (providedtemp is also known to be small).

What you would be doing is trading off

Code that has known problem to potentially cause stack overflow given enough iterations.

against

Code that if passes first iteration is known to not (directly) cause stack overflow

at the expense of

mov esp,[ebp+offsetToHiddenSaveStackPointer]

A fair trade-off IMHO.

For anyone else reading this I suggest we take a straw poll and reply to this thread with your vote/comment.

Jim Dempsey

0 Kudos
tropfen
New Contributor I
3,212 Views
Hello,

i am happy that your are able to find the reason for a problem that i have since many years.
(http://software.intel.com/en-us/forums/showthread.php?t=42116&o=a&s=lr)

Please find a good solution.

Thanks in advance
Frank
0 Kudos
mecej4
Honored Contributor III
3,212 Views
Jim's proposal for fixing the problem with stack temporaries overrunning stack limits is reasonable. However, I am inclined to consider means to fix the problem, more specifically how much stack growth is permitted before cleaning up, as implementation issues related to optimization.

In fact, if debug/check options have been specified, or optimization level zero has been requested, the compiler ought not to permit this stack overrun to occur or, if that cannot be avoided, the runtime should provide a clear message and a traceback.

With my shorter example code, I tried to get a traceback, but after stack overflow the program simply quit with no hint that anything went wrong. Rerunning the program with Cygwin/GDB made me note the stack overflow.

A user should not have to stoop to assembler level and monitor the ESP register if stack overflow is suspected.

Such behavior is what had Frank "tropfen" stumped for four years, and needs to be rectified. His thread ends with a reference to "Issue 356587". It is not clear if anything was done to resolve Issue 356587 between 2006 and now.
0 Kudos
tropfen
New Contributor I
3,212 Views
Hello mecej4,

four years ago intel was not able to reproduce the problem. The Issue was closed.

Frank

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,212 Views
mecej4,

Other than for optimization related differences, if the problem is systemic in Release Build it should be systemic in Debug Build - Otherwise you will have less of a chance in finding the problem.

RE: Debug and Stack Overflow.

Debug mode should have a stack guard page at bottom of stack. If the stack ever encroaches into this guard page then a debug exception should be raised. The compiler team can decide on how to impliment this. Had this feature been available to the original poster then this problem would have quickly been identified by either the original poster or any of us others monitoring this thread.

Jim Dempsey
0 Kudos
mecej4
Honored Contributor III
3,181 Views
I tried a slightly modified version of Steve Lionel's 7-line reproducer on Win-7. The Fortran run-time on this OS detects the stack overflow and prints a message, but does not give a traceback.
[fortran]program blowstack
  character(len=1000) padding
  iesp0=iesp()                  ! initial stack pointer
  do i=1,2000
    write (padding,'(I5.5,A)') i,'.txt'
    open (unit=1,file='input'//padding)
    close (1,status='delete')
    write(*,'(1x,I4,2x,Z08)')i,iesp0-iesp()   ! stack consumed
  end do
end program blowstack[/fortran]
The code for the utility function iesp() is, for 32-bit Windows:

[bash].686P
.model flat
PUBLIC _IESP
_TEXT SEGMENT
_IESP PROC
      lea eax, dword ptr [esp + 4]
      ret
_IESP ENDP
_TEXT ENDS
END
[/bash]
and, for 64-bit Windows:

[bash]PUBLIC	IESP
_TEXT	SEGMENT
IESP	PROC
	lea rax, qword ptr [rsp + 8]
	ret
IESP	ENDP
_TEXT	ENDS
END
[/bash]

For the default stack of 0x100000, the program crashes with the last few lines of output (this is for the 32-bit version; the numbers are slightly different for the 64-bit version):

[bash] 1018     FA860
 1019     FAC50
 1020     FB040
forrtl: severe (170): Program Exception - stack overflow

Stack trace terminated abnormally.
[/bash]
0 Kudos
Steven_L_Intel1
Employee
3,181 Views
The compiler does generate stack checking code in most cases, which will give a reasonable error. I'm not sure what happened here. It may be that the stack check is done only for "automatic" variables, allocated at the beginning of the routine. I will ask.
0 Kudos
Reply