Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28633 Discussions

executable crashes / stops without error

mr_katse
Beginner
5,142 Views
hi!
i am using a fortran-program which rewrites many files (in sum about 26000 files) into 1 file.
the input-files are opened one by one and closed after reading.
in ivf i have the issue, that the executable just stops without an error after opening/reading/closing about 16000 files. at the moment my workaround is to compile my code in compaq visual fortran. here i dont have any issues.
what can i do to compile the code in ivf?
thank you!
0 Kudos
45 Replies
anthonyrichards
New Contributor III
3,056 Views
Clearly you need to add some diagnostic code to help to trace where the problem lies. I would humbly suggest opening a text file and printing a confirmatory message to it after each successful file open and each successful file close, for a start and then see if it is one particular file that causes problems.

Are you properly closing files after use?

Are you re-using unit numbers or just incrementing the unit number?

If you want help from here, you should post some code showing your Open statement(s) and give some idea what data is being read in and what you do with it once you have read the data in.
0 Kudos
mecej4
Honored Contributor III
3,056 Views
I second the advice that Anthony Richards gave you. An incorrect program may work and give the mistakenly expected results -- that's covered by the Fortran Standard's phrase "undefined behavior". When such a program "works" with one compiler and fails with another, that may be the first sign that the program is faulty and needs to be corrected.

The following program copies its source code 26000 times to a single output file, and works as expected using the current release of the Intel compiler.
[fortran]
program wrmany
integer, parameter :: fout=11, fin=12
character(len=132) :: line
integer :: kount
open(fout,name='mulfil.txt',action='write')
do kount=1,26000
open(fin,name='katse.f90',action='read')
do
read(fin,'(A)',end=100)line
write(fout,'(A)')trim(line)
end do
100 close(fin)
end do
close(fout)
end program wrmany
[/fortran]
The resulting file has the expected length:

[bash]$ ls -l mulfil.txt 
-rw-r--r-- 1 mece users 11024000 2010-10-08 06:43 mulfil.txt
[/bash]
0 Kudos
mr_katse
Beginner
3,056 Views
i am fully aware, that the problem could be caused by bad code. i have checked the input-files. they seem to be ok. they are output-files written by another fortran-programm. here is my code, which i commented a little. i cannot post all input-files, because they are so many. i am doing hydrological modelling and the input-files i am reading are system-states, which are written every timestep - all in all ~26000 files/timesteps. thank you!

[fxfortran]      program cdr_zoneoutput_converter
      

c***** declarations *****
      integer MAXCOL, MAXROW, MAXFILE
      parameter (MAXCOL=30)
      parameter (MAXROW=1000)
      parameter (MAXFILE=50)
      parameter (MAXDAY = 30000)      
      character FILENAME*(MAXFILE)(MAXDAY)      
      character INFILE_GRIDS*(MAXFILE)      
      integer NCOL, NROW, ICOL, IROW, IFILE, NFILE
      character VALUES(MAXCOL,MAXROW)*13
      

c***** open output files *****
      
       open(unit=1,file='output/BFZON.txt')
       open(unit=2,file='output/BWOZON.txt')
       open(unit=3,file='output/BW3ZON.txt')
       open(unit=4,file='output/DELTASZON.txt')
       open(unit=5,file='output/ETATZON.txt')
       open(unit=6,file='output/ETP0ZON.txt')
       open(unit=7,file='output/ETPEZON.txt')
       open(unit=8,file='output/ETPRZON.txt')
       open(unit=9,file='output/MELTZON.txt')
       open(unit=10,file='output/PRAINSOILZON.txt')
       open(unit=11,file='output/PSNOWZON.txt')
       open(unit=12,file='output/PZON.txt')
       open(unit=13,file='output/QAB1ZON.txt')
       open(unit=14,file='output/QAB2ZON.txt')
       open(unit=15,file='output/QAB3ZON.txt')
       open(unit=16,file='output/QABZON.txt')
       open(unit=17,file='output/QEX2ZON.txt')
       open(unit=18,file='output/QVS0ZON.txt')
       open(unit=19,file='output/SCOVZON.txt')
       open(unit=20,file='output/SMELTZON.txt')
       open(unit=21,file='output/SWWZON.txt')
       open(unit=22,file='output/TOTALSZON.txt')
       open(unit=23,file='output/TZON.txt')

  
c***** user specifications *****      
      print*, 'Wieviel Zonen wurden berechnete (1-2000)? ' !some definition of zones
	read*, NROW
     

      NFILE=0
      NCOL=24

      do 900 IFILE=1, MAXDAY

c  ** read FILENAME **
      open (120, file='list.txt', status='old') !list.txt contains the files to be opened (~26000)
      NFILE=NFILE+1
      read (120, fmt=*, end=100) FILENAME(IFILE)
      
900   enddo

100   continue 
      close (120)   
  
c*** read in cdr-outputfile ***

      do 1000 IFILE=1, NFILE-1
      
      open (unit=98,file='input/'//FILENAME(IFILE),status='old',err=300) !each file is opened (and closed later, before the next one is opened)

      read (98,*) 

       do 200, IROW=1, NROW
        read (98,*, err=301) (VALUES(ICOL,IROW), ICOL=1, NCOL)
200    continue

c*** write output_files ***
       ICOL = 2
        do IROW = 1, NROW
         write (1, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (1,*)
       ICOL = 3
        do IROW = 1, NROW
         write (2, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (2,*)
       ICOL = 4
        do IROW = 1, NROW
         write (3, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (3,*)
       ICOL = 5
        do IROW = 1, NROW
         write (4, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (4,*)
       ICOL = 6
        do IROW = 1, NROW
         write (5, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (5,*)
       ICOL = 7
        do IROW = 1, NROW
         write (6, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (6,*)
       ICOL = 8
        do IROW = 1, NROW
         write (7, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (7,*)
       ICOL = 9
        do IROW = 1, NROW
         write (8, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (8,*)
       ICOL = 10
        do IROW = 1, NROW
         write (9, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (9,*)
       ICOL = 11
        do IROW = 1, NROW
         write (10, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (10,*)
       ICOL = 12
        do IROW = 1, NROW
         write (11, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (11,*)
       ICOL = 13
        do IROW = 1, NROW
         write (12, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (12,*)
       ICOL = 14
        do IROW = 1, NROW
         write (13, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (13,*)
       ICOL = 15
        do IROW = 1, NROW
         write (14, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (14,*)
       ICOL = 16
        do IROW = 1, NROW
         write (15, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (15,*)
       ICOL = 17
        do IROW = 1, NROW
         write (16, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (16,*)
       ICOL = 18
        do IROW = 1, NROW
         write (17, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (17,*)
       ICOL = 19
        do IROW = 1, NROW
         write (18, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (18,*)
       ICOL = 20
        do IROW = 1, NROW
         write (19, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (19,*)
       ICOL = 21
        do IROW = 1, NROW
         write (20, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (20,*)
       ICOL = 22
        do IROW = 1, NROW
         write (21, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (21,*)
       ICOL = 23
        do IROW = 1, NROW
         write (22, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (22,*)
       ICOL = 24
        do IROW = 1, NROW
         write (23, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (23,*)

      close (98)   
         
      
  
1000      enddo

       close (1)
       close (2)
       close (3)
       close (4)
       close (5)
       close (6)
       close (7)
       close (8)
       close (9)
       close (10)
       close (11)
       close (12)
       close (13)
       close (14)
       close (15)
       close (16)
       close (17)
       close (18)
       close (19)
       close (20)
       close (21)
       close (22)
       close (23)
      print*,'finished normally' 
      stop      

300   Print*, 'ERROR opening ', FILENAME(IFILE)
      goto 302
301   Print*, 'ERROR reading ', FILENAME(IFILE)
      Print*, 'Aborted... '            

302   stop
      end



could be caused by wrong coding.

[/fxfortran]
0 Kudos
jimdempseyatthecove
Honored Contributor III
3,056 Views
>>open(unit=6,file='output/ETP0ZON.txt')

Look for your error message in the above file
I suggest changing that unit number (and 5) to something else

Jim Dempsey
0 Kudos
Les_Neilson
Valued Contributor II
3,056 Views
These days the general concensus is to avoid using unit numbers less than 10, but certainly avoid 5 and 6.
AlsoI would suggest moving the close(98) to just after the "200 continue" and maybe add an "err=" clause.(keeps everything about unit 98 together) and consider replacing all those write loops with something like :


write (1, fmt='(A$)') (VALUES (2,IROW), irow=1,nrow)
write (1,*)
(Also replacing the unit numbers "1"..."9" with whatever he uses in accordance with my first sentence.)
I know it's a style thing, but to my mind is much more readable than the original code.

Les

0 Kudos
mr_katse
Beginner
3,056 Views
what do you mean with "Look for your error message in the above file"?

i changed unit=5 to 95 and unit=6 to 96. problem unfortunately still prevails.
0 Kudos
mecej4
Honored Contributor III
3,056 Views
Your program is overrunning the default maximum record length (132 bytes) for formatted I/O, since you are writing output files with records as large as NROW*13 in length. To correct this error, you need to use the RECL=... option in the OPEN statements for the output files.

You should probably use a format of (A,1x,$) instead of (A,$) in the WRITE statements so that you can separate the fields.

In general, using unit numbers 10 and above for external files has a better chance of avoiding clashes with special files (standard input, output, error, punch, printer, etc.).

So, instead of using units 1 to 23, you could use units 11 to 33 and see if the error persists.

Re Jim Dempsey's reply: what he pointed out was that since you had used the standard output unit (6) to open an external file, I/O and other error messages generated during the run would have been directed to the file rather than being displayed at the console.
0 Kudos
mr_katse
Beginner
3,056 Views
thank you for your hints. i am a "free time" programmer, so some stuff is new to me.
i adopted the code (write (1, fmt='(A$)') (VALUES (2,IROW), irow=1,nrow) and unit numbers). i also changed the RECL to 1024, but i have no clue, if this is right. i didnt change the output formats, since a blank is good for me. still the executable crashes.

i have the same problem with another code i wrote to rewrite some ascii grids. maybe you can see some similarities in the codes i dont see. the similarity i see, is that also here many files are opened (~100000). here the code:
[bash]      program rewrite_inca

c***** declaration *****

      integer MAXCOL, MAXROW, MAXFILE
      parameter (MAXCOL=1000)
      parameter (MAXROW=1000)
      parameter (MAXFILE=40)
      
      character FILENAME*(MAXFILE), FOLDERNAME*(8)
      integer NCOL, NROW, ICOL, IROW, IFILE, I, I1, FLIP, ZEILE
      integer stitch, flag
      real VAL(MAXCOL,MAXROW), NODATA_VALUES
      character XLLCORNER*(MAXFILE), YLLCORNER*(MAXFILE)
      character CELLSIZE*(MAXFILE)
    
       flag = 0
       open (36, file='RR_RWERROR_Log.txt')

      do 500 IFILE=1, 150000


c  ** read FILENAME ** 

open (20, file='list_folder.txt', err=150) read (20, fmt=*, err=150, end=120) FOLDERNAME goto 121 120 continue flag = 1 121 continue print*, 'DATE: ', FOLDERNAME c ** read INCA-file do 501 I1=1, 96 open (21, file='list_files_RR.txt', err=151) read (21, fmt=*, err=151) FILENAME print*, 'File: ', FILENAME open (unit=30,file='RR/'//FOLDERNAME//'/'//FILENAME, NCOL = 601 NROW = 351 FLIP = 0 do IROW=1, NROW ZEILE=NROW-FLIP STITCH = 0 do I=1, 60 read (30,*,err=600) (VAL((ICOL+STITCH),ZEILE), ICOL=1, 10) STITCH = STITCH + 10 enddo read (30,*,err=600) VAL(601,ZEILE) FLIP = FLIP + 1 enddo c***** create ascii grids ***** c ** INCA domain XLLCORNER = '99500' YLLCORNER = '249500' CELLSIZE = '1000.' NODATA_VALUES = -9999. open (unit=89,file='d:tempdest +/'//FOLDERNAME//FILENAME) write (89,fmt=*) 'ncols', NCOL write (89,fmt=*) 'nrows', NROW write (89,fmt='(A,A)') ' xllcorner ', XLLCORNER write (89,fmt='(A,A)') ' yllcorner ', YLLCORNER write (89,fmt='(A,A)') ' cellsize ', CELLSIZE write (89,fmt=*) 'NODATA_value', NODATA_VALUES do IROW = 1, NROW do ICOL = 1, NCOL write (89, fmt='(F7.2$)') VAL (ICOL,IROW) enddo write (89, fmt=*) enddo close (30) close (89) 501 continue goto 602 600 continue write (36,fmt=*) FOLDERNAME, ' - ', FILENAME 602 continue close (21) if (flag.eq.1) goto 100 500 continue goto 100 150 continue print*, 'ERROR opening/reading "list_folder.txt"...' goto 100 151 continue print*, 'ERROR opening/reading "list_files.txt"...' goto 100 152 continue print*, 'ERROR opening/reading Orig. INCA-file...' 100 continue close(36) print*,'finished normally' end [/bash]


0 Kudos
mecej4
Honored Contributor III
3,056 Views
i also changed the RECL to 1024, but i have no clue, if this is right.

This is probably not enough, unless NROW is less than 79. You need to increase RECL to the length of the longest formatted line output. If NROW were 2000, for example, RECL would need to be over 26000 (add to this to allow for field separators, end-of-line).
0 Kudos
jimdempseyatthecove
Honored Contributor III
3,056 Views
Use an old standby practice of inserting trace code into your program

[bash]do 1000 IFILE=1, NFILE-1   
      write(*,*) 'IFILE=', IFILE
      open (unit=98,file='input/'//FILENAME(IFILE),status='old',err=300) !each file is opened (and closed later, before the next one is opened)   
  
      read (98,*)    
      write(*,*) 'read rows'
  
       do 200, IROW=1, NROW   
        read (98,*, err=301) (VALUES(ICOL,IROW), ICOL=1, NCOL)   
200    continue
      write(*,*) 'read complete'
...
      ICOL = 22  
      write(*,*) 'ICOL = ', ICOL 
      do IROW = 1, NROW   
       write (21, fmt='(A$)') VALUES (ICOL,IROW)   
      enddo
      write(*,*) 'done'
      write (21,*)
...
[/bash]
Jim Dempsey

0 Kudos
mr_katse
Beginner
3,056 Views
NROW is 791 - i set RECL to 10400 - this didnt help.
could it be a problem with file handles of the OS? i think that i read something like that some time ago.
0 Kudos
anthonyrichards
New Contributor III
3,056 Views
If your program crashes, then clearly it starts and runs for a time and so it is not a compilation problem at present.

If your program 'Crashes', then there will be one or more informative error messages output to the console (assuming it is a console program).
If you want to diagnose the problem, showing us the error message you get when the program 'crashes', in your words, is the minimum information we need. So please can you oblige? Otherwise it's blind guessing, which is a waste of everyone's time.

If you insert diagnostic code into your program, as is highly recommended by posters who are trying to help, then at least you should discover where in your programmed loop the program fails. Then you can start delving into the code, with the help of execution error messages, to try and pin down the exact cause of the program to fail.
0 Kudos
mr_katse
Beginner
3,056 Views
it is a console program and it crashes without an error message in the console - just as i wrote in my 1. post!

we are just moving from xp to w7. in xp there is no error message from the operating system. i now tried the programm in w7 and i get an error from the os saying something like "executable does not work anymore. windows can search for a solution online". then you can choose between "search online for a solution and close program" or "close program". no error messages appear in the console.

sorry jim - i overlooked your post. i now added the diagnostic code - the program stops after printing IFILE = 15944. i checked the file which should be opened (as i already did before) and its ok.


0 Kudos
Les_Neilson
Valued Contributor II
3,056 Views
Assuming you alsohave

read (98,*)
write(*,*) 'read rows'
and you didn't see the 'read rows' message then
I suggest you change the read to
read(98,*,iostat=ier,err=399)
Then at label 399 you print out the iostat error number
-2 means end-of-record condition for nonadvancing read
-1 means end-of-file condition
+ve integer >0means an error occurred See the list of Run-Time Error Messages in the help

(I thought there was a subroutine we could call to get the text from an iostat code but a quick skim of the help didn't show it. Maybe I didn't look hard enough)

Anyway now you know it occurs when IFILE is 15944 you can add more debug statements along the lines of
if (IFILE==15944) then
print "Filename = ",FILENAME(IFILE),"#" ! prove that you have the correct file name and
! check it doesn't contain special characters for example
endif
etc.

Les

0 Kudos
mr_katse
Beginner
3,056 Views
i also implemented the advised code of les:
[fxfortran]      do 1000 IFILE=1, NFILE-1
      
      write(*,*) 'IFILE ', IFILE    
      if (IFILE.gt.15000) write(*,*) 'FILE before open ',FILENAME(IFILE)
             
      open (unit=98,file='input/'//FILENAME(IFILE),status='old', 
     + IOSTAT=IER,err=300)
      
      if (IFILE.gt.15000) write(*,*) 'FILE after open ', FILENAME(IFILE)
      if (IFILE.gt.15000) write(*,*) 'IOSTAT - Open ', IER
        
       read(98,*,iostat=ier,err=399)
[/fxfortran]

the exe doesnt necessarily stop at IFILE = 15944. it also happened at 15942 and 15901. so it must not be a problem of the input-files, as they were read before (exept 15944). IOSTAT is 0 after opening, unfortunately the program does not jump to error-label 399, as it should.
my command line looks like this:

0 Kudos
Les_Neilson
Valued Contributor II
3,056 Views
Can I suggest that you compile/build/runa version of the exe with all of the check options on ?
i.e. /warn:all /check:all
The first may catch compile problems (if any) and the second catch any run-time problems with array/string bounds exceeded, uninitialised variables etc.
There is definitely something strange going on.

Les
0 Kudos
mr_katse
Beginner
3,056 Views
i was already using /warn:all /check:all. no errors/warnings are shown.

opening the w7 taskmanager and looking at the "harddisk properties" (i have a german version - i dont know how its called in english) where one can see the files being used, it shows that many files are listed. the closing of the files seems to be rather slow. so i tried to idle the cpu for some time after 15000 files, so the files could be closed. the closing worked (or at least the taskmanager showed so), but the exe still stoped.

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,056 Views

Just a stab in the dark...

Are any of your paths network mapped drives (e.g. D:)?

The reason I ask is I have an old (legacy) Win32 app (non-Fortran) that performs a very large number of file directory search/open/copy/close operations. Works fine from XP to XP but has problems XP to Vista (writing to Vista) where it gets several 1000's of files into the program a network resource limitation is reached. Apprently the OS is trying to throttle down the activity with an error that ought to be retried by application. Resuming (several times) completes the application. I did not add code to test for this error, then pause for a while, then resume.

Did you use Les's suggestion for collecting io status code in addition to taking error dispatch to READ (and WRITE)?

Jim

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,056 Views
With respect to my above note, try this:

In your main loop add something like

if(mod(ifile,10000) == 0) then
write(*,*) 'Sleeping 30 seconds...'
sleepqq(30000) ! 30 second wait
endif

Jim
0 Kudos
mr_katse
Beginner
2,678 Views
the drive i am running the program on is a local one.
i tried the delaying of the programm already - but with sleep(60). i now tried the sleepqq(30000) you suggested, but unfortunately it doesnt help.
i implemented the iostatus code. the status of the last opened file is 0. with the first read statement, the programm crashes without jumping to the error label (see reply #15).
0 Kudos
Reply