Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
29284 Discussions

Is there a limit on number of call's to BACKSPACE command?

Morway__Eric
Beginner
1,476 Views

While reading through a rather large text file (~46 Gb), a little less than halfway through the snippet of code shown below crashes.  The variable IFTLFMT is a flag for indicating if the file being read is binary (IFTLFMT=0) or formatted text (IFTLFMT=1).  For debugging, IFTLFMT is currently equal to 1, though the code seems to crash when reading the binary equivalent as well.  With IFTLFMT = 1, the BACKSPACE command is called with each loop and I'm suspicious this is where the problem is.

The behavior leading up to the crash is that as the Fortran shown below is looping, it reads the middle line of example text shown below (shown after the fortran code) and on the next loop, instead of reading the next line of text:

SFR   REJ      1    32    16     1     3  0.108565

It reads LABEL,TEXT which appear to be read in OK based on the values they are filled with, but then runs BACKSPACE(INUF) followed by the next READ statement which is where it crashes.  All of the values in the READ statement are filled with 0.00, which clearly are not in the text file.  Thus, I'm wondering if there is a limit on the number of time BACKSPACE can be used?  I would venture a ballpark guess that BACKSPACE has been called somewhere in the neighborhood of 51.3 million times by the time the crash occurs.  

I've also tried removing the BACKSPACE command, but my limited understanding of READ is that it reads an entire line when reading formatted text.  Thus, the code won't continue reading part way through a line, but this is exactly what I need because the second read statement depends on the value of the first entry on the line.  

Code that crashes:

C--READ CONNECTIONS INFORMATION
      DO I=1,NCON
        IF(IFTLFMT.EQ.0) THEN
          READ(INUF) LABEL,TEXT
        ELSEIF(IFTLFMT.EQ.1) THEN
          READ(INUF,*) LABEL,TEXT
          BACKSPACE(INUF)
        ENDIF
C
C--LOOP THROUGH EACH CONNECTION
C
C--IF UZF -> SFR, READ 8 VALUES
        IF(LABEL.EQ.'SFR ') THEN
          IF(IFTLFMT.EQ.0) THEN
            READ(INUF) KK,II,JJ,ISTSG,NREACH,Q
          ELSEIF(IFTLFMT.EQ.1) THEN
            READ(INUF,2) LABEL,TEXT,KK,II,JJ,ISTSG,NREACH,Q
   2        FORMAT(2X,A4,2X,A4,5I6,F)
          ENDIF
          IROUTE(1,I)=1  !1:SFR, 2:LAK, 3:SNK
          IROUTE(2,I)=KK
          IROUTE(3,I)=II
          IROUTE(4,I)=JJ
          IROUTE(5,I)=ISTSG
          IROUTE(6,I)=NREACH
C         Do some more stuff...
        ELSEIF(LABEL.EQ.'LAK ') THEN
C         ...
        ELSEIF(LABEL.EQ.'SNK ') THEN
C         ...
        ENDIF
      ENDDO

 Example of text that is being read:

...
  SFR   GRW      1    32    16     1     2  0.064677
  SFR   REJ      1    32    16     1     2  0.130278
  SFR   GRW      1    32    16     1     3  0.053897  !After this line, the code crashes
  SFR   REJ      1    32    16     1     3  0.108565
  SFR   GRW      1    32    16     1     4  0.053897
...

 

 

0 Kudos
7 Replies
mecej4
Honored Contributor III
1,476 Views

Have you considered using non-advancing I/O for formatted READs? You can even do a mix of READs, some advancing and some non-advancing. In those that are non-advancing, you can use an EOR=nnn clause to handle end-of-record, as well. Given that your data file is tens of GB long, I am not going to suggest that you post it here!

Depending on the implementation, BACKSPACE can be quite inefficient. For instance, there was one implementation where the current record number n was tracked, and BACKSPACE was implemented by REWIND + (n-1) READs.

A couple of comments, which you are welcome to ignore: why do you make IFTLFMT an integer, when its meaningful values are only 0 and 1? Use a LOGICAL variable instead, with a name such as FILE_IS_BINARY or FILE_IS_TEXT. When I scanned your code, I asked myself, "what if IFTNFMT=2, 3, -5, etc.? There is no provision for those cases". I then read your description and saw that only 0 and 1 were used.

Similarly, why query IFTNFMT inside the loop? If the data file is formatted, all records are "formatted". Thus, you could structure the code as

IF (FILE_IS_BINARY) THEN

   DO I=1,NCON

  ...code to process unformatted file

ELSE

   DO I=1,NCON

   ... code to process formatted file

ENDIF

0 Kudos
JVanB
Valued Contributor II
1,476 Views

The first thing I would try is to change the F edit descriptor to F10.0 in Format #2 so that you aren't relying on an extension. If that fails, make up two or more FORMAT statements and use nonadvancing READ:

          READ(INUF,1,advance='NO') LABEL,TEXT
    1   FORMAT(2X,A4,2X,A4)
!         BACKSPACE(INUF)
...
            READ(INUF,2) KK,II,JJ,ISTSG,NREACH,Q
    2       FORMAT(5I6,F10.0)
...
        ELSE
          READ(INUF,3)
    3     FORMAT()

Now, if the labels could be laid out differently depending on their values, you are forced into that list-directed READ which precludes nonadvancing I/O, but you might be able to work around the BACKSPACE by opening the data file as formatted stream.

 

0 Kudos
IanH
Honored Contributor III
1,476 Views

In addition to the suggestions of others, there's also the possibility of reading the entire formatted record into a character variable, and then chopping that variable up.

My ballpark guess is that your ballpark guess of 51.3 million backspaces corresponds to a file position close to 2**31 or 2**32.  Failing that, backspace (or read) is leaking a resource.

0 Kudos
GVautier
New Contributor III
1,476 Views

Hello

For me, the most efficient way to achieve what you want to do is to read the line in a character variable, analyse it and branch to the right way. With that method, no backspace is needed

 

--READ CONNECTIONS INFORMATION
      DO I=1,NCON
         read(inuf,*)buffer
         read(buffer,..fmt..)label,text
C
C--LOOP THROUGH EACH CONNECTION
C
C--IF UZF -> SFR, READ 8 VALUES
        IF(LABEL.EQ.'SFR ') THEN
          IF(IFTLFMT.EQ.0) THEN
            READ(INUF) KK,II,JJ,ISTSG,NREACH,Q
          ELSEIF(IFTLFMT.EQ.1) THEN
            READ(buffer,2) LABEL,TEXT,KK,II,JJ,ISTSG,NREACH,Q
   2        FORMAT(2X,A4,2X,A4,5I6,F)

 

0 Kudos
GVautier
New Contributor III
1,476 Views

Hello

 

For me, the most efficient way to achieve what you want to do is to read the line in a character variable, analyse it and branch to the right way.

With that method, there is no more need of a backspace

 

--READ CONNECTIONS INFORMATION
      DO I=1,NCON
         read(inuf,*)buffer
         read(buffer,..fmt..)label,text
C
C--LOOP THROUGH EACH CONNECTION
C
C--IF UZF -> SFR, READ 8 VALUES
        IF(LABEL.EQ.'SFR ') THEN
          IF(IFTLFMT.EQ.0) THEN
            READ(INUF) KK,II,JJ,ISTSG,NREACH,Q
          ELSEIF(IFTLFMT.EQ.1) THEN
            READ(buffer,2) LABEL,TEXT,KK,II,JJ,ISTSG,NREACH,Q
   2        FORMAT(2X,A4,2X,A4,5I6,F)

 

0 Kudos
LRaim
New Contributor I
1,476 Views

Copy the input file on a direct access file or even on a memory area and than you do not need to use backspace.

 

0 Kudos
andrew_4619
Honored Contributor III
1,476 Views

I would strongly agree with #4 and #5. Using a file operation to parse particularly on a big file is geologically slow. Beware, if you are using large numbers of internal reads from a text buffer using the latest 16 compiler just released, If you look at some other threads that seems to be a problem with a resource handles not being released. 

0 Kudos
Reply