- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While reading through a rather large text file (~46 Gb), a little less than halfway through the snippet of code shown below crashes. The variable IFTLFMT is a flag for indicating if the file being read is binary (IFTLFMT=0) or formatted text (IFTLFMT=1). For debugging, IFTLFMT is currently equal to 1, though the code seems to crash when reading the binary equivalent as well. With IFTLFMT = 1, the BACKSPACE command is called with each loop and I'm suspicious this is where the problem is.
The behavior leading up to the crash is that as the Fortran shown below is looping, it reads the middle line of example text shown below (shown after the fortran code) and on the next loop, instead of reading the next line of text:
SFR REJ 1 32 16 1 3 0.108565
It reads LABEL,TEXT which appear to be read in OK based on the values they are filled with, but then runs BACKSPACE(INUF) followed by the next READ statement which is where it crashes. All of the values in the READ statement are filled with 0.00, which clearly are not in the text file. Thus, I'm wondering if there is a limit on the number of time BACKSPACE can be used? I would venture a ballpark guess that BACKSPACE has been called somewhere in the neighborhood of 51.3 million times by the time the crash occurs.
I've also tried removing the BACKSPACE command, but my limited understanding of READ is that it reads an entire line when reading formatted text. Thus, the code won't continue reading part way through a line, but this is exactly what I need because the second read statement depends on the value of the first entry on the line.
Code that crashes:
C--READ CONNECTIONS INFORMATION DO I=1,NCON IF(IFTLFMT.EQ.0) THEN READ(INUF) LABEL,TEXT ELSEIF(IFTLFMT.EQ.1) THEN READ(INUF,*) LABEL,TEXT BACKSPACE(INUF) ENDIF C C--LOOP THROUGH EACH CONNECTION C C--IF UZF -> SFR, READ 8 VALUES IF(LABEL.EQ.'SFR ') THEN IF(IFTLFMT.EQ.0) THEN READ(INUF) KK,II,JJ,ISTSG,NREACH,Q ELSEIF(IFTLFMT.EQ.1) THEN READ(INUF,2) LABEL,TEXT,KK,II,JJ,ISTSG,NREACH,Q 2 FORMAT(2X,A4,2X,A4,5I6,F) ENDIF IROUTE(1,I)=1 !1:SFR, 2:LAK, 3:SNK IROUTE(2,I)=KK IROUTE(3,I)=II IROUTE(4,I)=JJ IROUTE(5,I)=ISTSG IROUTE(6,I)=NREACH C Do some more stuff... ELSEIF(LABEL.EQ.'LAK ') THEN C ... ELSEIF(LABEL.EQ.'SNK ') THEN C ... ENDIF ENDDO
Example of text that is being read:
... SFR GRW 1 32 16 1 2 0.064677 SFR REJ 1 32 16 1 2 0.130278 SFR GRW 1 32 16 1 3 0.053897 !After this line, the code crashes SFR REJ 1 32 16 1 3 0.108565 SFR GRW 1 32 16 1 4 0.053897
...
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you considered using non-advancing I/O for formatted READs? You can even do a mix of READs, some advancing and some non-advancing. In those that are non-advancing, you can use an EOR=nnn clause to handle end-of-record, as well. Given that your data file is tens of GB long, I am not going to suggest that you post it here!
Depending on the implementation, BACKSPACE can be quite inefficient. For instance, there was one implementation where the current record number n was tracked, and BACKSPACE was implemented by REWIND + (n-1) READs.
A couple of comments, which you are welcome to ignore: why do you make IFTLFMT an integer, when its meaningful values are only 0 and 1? Use a LOGICAL variable instead, with a name such as FILE_IS_BINARY or FILE_IS_TEXT. When I scanned your code, I asked myself, "what if IFTNFMT=2, 3, -5, etc.? There is no provision for those cases". I then read your description and saw that only 0 and 1 were used.
Similarly, why query IFTNFMT inside the loop? If the data file is formatted, all records are "formatted". Thus, you could structure the code as
IF (FILE_IS_BINARY) THEN
DO I=1,NCON
...code to process unformatted file
ELSE
DO
I=1,NCON
... code to process formatted file
ENDIF
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The first thing I would try is to change the F edit descriptor to F10.0 in Format #2 so that you aren't relying on an extension. If that fails, make up two or more FORMAT statements and use nonadvancing READ:
READ(INUF,1,advance='NO') LABEL,TEXT 1 FORMAT(2X,A4,2X,A4) ! BACKSPACE(INUF) ... READ(INUF,2) KK,II,JJ,ISTSG,NREACH,Q 2 FORMAT(5I6,F10.0) ... ELSE READ(INUF,3) 3 FORMAT()
Now, if the labels could be laid out differently depending on their values, you are forced into that list-directed READ which precludes nonadvancing I/O, but you might be able to work around the BACKSPACE by opening the data file as formatted stream.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In addition to the suggestions of others, there's also the possibility of reading the entire formatted record into a character variable, and then chopping that variable up.
My ballpark guess is that your ballpark guess of 51.3 million backspaces corresponds to a file position close to 2**31 or 2**32. Failing that, backspace (or read) is leaking a resource.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
For me, the most efficient way to achieve what you want to do is to read the line in a character variable, analyse it and branch to the right way. With that method, no backspace is needed
--READ CONNECTIONS INFORMATION DO I=1,NCON read(inuf,*)buffer read(buffer,..fmt..)label,text C C--LOOP THROUGH EACH CONNECTION C C--IF UZF -> SFR, READ 8 VALUES IF(LABEL.EQ.'SFR ') THEN IF(IFTLFMT.EQ.0) THEN READ(INUF) KK,II,JJ,ISTSG,NREACH,Q ELSEIF(IFTLFMT.EQ.1) THEN READ(buffer,2) LABEL,TEXT,KK,II,JJ,ISTSG,NREACH,Q 2 FORMAT(2X,A4,2X,A4,5I6,F)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
For me, the most efficient way to achieve what you want to do is to read the line in a character variable, analyse it and branch to the right way.
With that method, there is no more need of a backspace
--READ CONNECTIONS INFORMATION DO I=1,NCON read(inuf,*)buffer read(buffer,..fmt..)label,text C C--LOOP THROUGH EACH CONNECTION C C--IF UZF -> SFR, READ 8 VALUES IF(LABEL.EQ.'SFR ') THEN IF(IFTLFMT.EQ.0) THEN READ(INUF) KK,II,JJ,ISTSG,NREACH,Q ELSEIF(IFTLFMT.EQ.1) THEN READ(buffer,2) LABEL,TEXT,KK,II,JJ,ISTSG,NREACH,Q 2 FORMAT(2X,A4,2X,A4,5I6,F)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Copy the input file on a direct access file or even on a memory area and than you do not need to use backspace.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would strongly agree with #4 and #5. Using a file operation to parse particularly on a big file is geologically slow. Beware, if you are using large numbers of internal reads from a text buffer using the latest 16 compiler just released, If you look at some other threads that seems to be a problem with a resource handles not being released.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page