Problems reading from a named pipe

Matti_Taskinen · ‎03-12-2014

I have problems with a Fortran program reading contents from a named pipe in Linux. This can be experienced, for example, using awk:

ifort -O test_fifo.f90 -o test_fifo

rm -f file.fifo

mkfifo file.fifo

awk 'BEGIN { line=0; \
             while (1) { line = line + 1; \
                         printf "%d01", line; \
                         for (i=2;i<=17;i++) printf " %d%02d", line, i; \
                         for (i=1;i<= 7;i++) printf " %.2f", line+i/100.0; \
                         print ""; } }' > file.fifo &

./test_fifo

where named pipe file.fifo is created and awk is used to write lines of 17 integer and 7 float columns so that they can be detected. Program test_fifo.f90 simply reads the lines and checks the first and the last columns:

program test_fifo

implicit none

integer, parameter :: int_small = selected_int_kind(9)
Integer, parameter :: real_low  = selected_real_kind(p=6, r=30)

integer(kind=int_small)     :: scols(17)
real(kind=real_low)         :: fcols(7)
character(len=*), parameter :: filename = 'file.fifo'
integer                     :: unit = 11, line, io

print*,'Opening file ', filename
open(unit, file=filename, action='read', form='formatted', iostat=io)
if (io /= 0) then
  print*,'Unable to open file ', filename
  stop
end if

line = 0
do while (.TRUE.)
  line = line + 1

  ! Read 17 integer columns and 7 float columns:
  scols = 0
  fcols = 0.0
  read(unit,*,iostat=io) scols, fcols

  ! Check the first integer and the last float column:
  if ((scols(1) /= line*100 + 1) .OR. (fcols(7) /= line + 0.07)) then
    print*,'Mismatch on line ', line, ':'
    print*, scols, fcols
  end if

  ! Check for io errors (and eof):
  if (io /= 0) then
    print*,'Read returned ', io, ' on line ', line
    exit
  end if
end do

end program test_fifo

For majority of the input lines this works well but once in a while there is a corruption in the read:

 Mismatch on line           67 :
        6701        6702        6703        6704        6705           6
         706        6707        6708        6709        6710        6711
        6712        6713        6714        6715        6716   6717.000    
   67.01000       67.02000       67.03000       67.04000       67.05000    
   67.06000    
 Mismatch on line           99 :
        9901        9902        9903        9904        9905        9906
        9907        9908        9909        9910        9911         991
           2        9913        9914        9915        9916   9917.000    
   99.01000       99.02000       99.03000       99.04000       99.05000    
   99.06000    
 Mismatch on line          208 :
       20801           2         802       20803       20804       20805
       20806       20807       20808       20809       20810       20811
       20812       20813       20814       20815       20816   20817.00    
   208.0100       208.0200       208.0300       208.0400       208.0500    
   208.0600    
...

where contents of a column is clearly split into two separate numbers. This happens in three environments:

CentOS 5.9, ifort 12.1.1.256 Build 20111011
Ubuntu 12.04, ifort 13.1.2.183 Build 20130514
CentOS 6.5, ifort 14.0.1.106 Build 20131008

With GNU Fortran (gfortran) the example is working correctly.

The problem seems to be related to the output buffering mode of awk. If the output is flushed after each line:

                         print ""; fflush(); } }' > file.fifo &

or if the buffering mode is changed with stdbuf to line-buffered:

stdbuf -oL awk 'BEGIN { line=0; \

there is no corruption. Unfortunately the program writing to the named pipe is user-given and stdbuf is not available for some of the target environments, so a Fortran side of a solution is desirable.

Is ifort working properly here/for you? Is there a compile time option or open/read parameter that can change the behavior?

Thanks,

Matti

Kevin_D_Intel · ‎03-13-2014

I reproduced the described behavior (the first mismatch appearing at iteration 452 for me) and avoidance altering the buffering and the gfortran behavior too. I was unable to understand the difference in behavior and will consult with our I/O developers for assistance.

Kevin_D_Intel · ‎03-13-2014

I submitted this to our I/O Developers (internal tracking id is noted below) for further analysis. One other note, I see the same mismatches you showed when running on a local disk. My earlier note about iteration 452 occurred when running on NFS. I will update again after I learn more.

(Internal tracking id: DPD200254427)

Izaak_Beekman · ‎03-13-2014

Well, I think the issue is that you're reading from a FIFO, and by default Fortran IO is record based. What I think is happening is that the Fortran read statement is exhausting the contents of the FIFO before a newline character is printed, which is used as a record indicator for formatted files. My test on Mac showed that I see the same behaviour as you do, but that if I add the access='stream' keyword to the open statement the problem appears to go away. (At least it takes MUCH longer to encounter the issue with ifort, and I think the next issue is due to the record length being exceeded.) In general stream access makes your IO behave in a more c-like fashion.

I also wonder about read(lun,*) because the * edit descriptor is processor (compiler/machine/etc.) dependent, *I think* but I could be wrong.

Also, notice that the number of characters per line/record and per integer/float is increasing as the awk program runs. This might cause issues vis a vis the recl= rl specifier of the open statement. This specifies the maximum record length for a sequential formatted file, and is optional. If omitted it receives a default value which is processor dependent. If one performs an inquire(unit,recl=rec) on your original code (without stream access) it appears that ifort uses a maximum record length of 132 characters for formatted sequential access files by default. In gfortran this is listed as -1 which presumably means there is no maximum record length but it seems weird to list a value of -1 here and I wonder if that is standards conforming.

If stream access is specified on the open statement, the inquire(unit,recl=rec) returns a value of -1 for ifort, and 1 for gfortran. In my mind, positive 1 makes sense for the value here, because stream access will read in the stream one character at a time until it has finished performing the IO requested in the read statement. -1 as a recl for both ifort and gfortran seems strange to me, but I haven't checked it against the standard.

Kevin_D_Intel · ‎03-14-2014

Zaak - Thank you for your time investigating this issue and very insightful findings! I forwarded that to our Developers.

Izaak_Beekman · ‎03-14-2014

The negative values for recl returned by both ifort and gfortran seems a bit wacky to me, but may very well be standard compliant, I'm not sure without looking it up.

I added some logic to look at the value of iostat=io after the read statement, and I think that, if indeed end of record is encountered because the fortran program is trying to read a full record (line) before it is available, then the standard dictates the value or io should be set to iostat_end or iostat_eor from iso_fortran_env. Adding some logic to examine iostat after the read seems to indicate that neither of these issues are ever signaled which mean either: 1) This is a bug in ifort or 2) Some other issue is the root of my problem other than my diagnosis.

Stream access does appear to at least improve the robustness of reading from a fifo, if not provide a complete fix.

Steven_L_Intel1 · ‎03-14-2014

Here's what the standard says:

6 9.10.2.26 RECL= specifier in the INQUIRE statement
7 1 The scalar-int-variable in the RECL= specifier is assigned the value of the record length of a connection for direct
8 access, or the value of the maximum record length of a connection for sequential access. If the connection is for
9 formatted input/output, the length is the number of characters for all records that contain only characters of
10 default kind. If the connection is for unformatted input/output, the length is measured in file storage units. If
11 there is no connection, or if the connection is for stream access, the scalar-int-variable becomes undefined.

In the case of "no maximum", I think HUGE(0) is probably a better choice than -1, but I can see the logic behind -1 since this is the value the language often requires for "don't know".

Izaak_Beekman · ‎03-14-2014

Yeah, that's basically what MFE says. Strange, so it seems that gfortran is in violation regarding recl. with sequential access here. I wonder if gfortran has a uint vs twos compliment issue here: -1 is the uint equivalent of huge(0) in twos compliment.

Steven_L_Intel1 · ‎03-14-2014

I am much more likely to think that one of the contributors decided that -1 meant "unknown" given that the standard lacks explicit wording. Come to think of it, this would make a reasonable interpretation request. I will see if it has come up before, and if not, propose one.