reading stream binary data from stdin: pipes vs redirects ('<')

Izaak_Beekman · ‎07-02-2010

Does anyone know why when I close stdin and reopen it as unformatted, stream access I can successfully read in files using the redirection '<' symbol, but if I 'cat' the file and pipe it into the code, I get an 'illegal seek' error?

I have also tried to use a named pipe created with mkfifo and this still happens.

Say I use matlab to make some data:

[plain]>> fid = fopen('test.bin','w');
>> fwrite(fid,double(pi),'double');
>> fclose(fid);[/plain]

then I try the following test program:

[fortran]program tester
use iso_fortran_env
implicit none
doubleprecision :: pi
integer :: error

close(input_unit)
open(unit=input_unit,access='stream',form='unformatted',iostat=error)
write(output_unit,*) error
read(input_unit,iostat=error) pi
write(output_unit,*) error
write(output_unit,*) iostat_end
write(output_unit,*) iostat_eor
write(output_unit,*), pi

end program
[/fortran]

Also, please note, the original source code douse not have an extra 'd' in the doubleprecision declaration. This appears to be a bug with the source highlighter.

Compiled with:

[bash]ifort -g -warn all -check all -traceback -o tester tester.f90[/bash]

Using a pipe I get this:

[bash]$ cat test.bin | ./tester
           0
          38
          -1
          -2
  1.117351266127586E-315
[/bash]

Using redirection i get this:

[bash]$ ./tester < src/matlab/roughness/test.bin
           0
           0
          -1
          -2
   3.14159265358979[/bash]

Is this because there are no longer record indicators so when I close then reopen stdin as stream access the program loses track of where it is in the stream?

If I try this technique with sequential access I don't encounter this problem. (Note, you need to write binary data with fortran so that it includes the sequential access record indicators which are stripped off for stream access.)

Thanks,
-Z

jimdempseyatthecove · ‎07-03-2010

It appears to me that "cat" and/or pipe is filtering (modifying) the data.
Such as converting linefeed to return/linefeed, tab to spaces, etc...

The less than simply binds the input file to stdin.

Jim Dempsey

mecej4 · ‎07-03-2010

It appears to me that "cat" and/or pipe is filtering (modifying) the data.

Indeed. The file test.bin contains a byte with value 09, which can be interpreted as a tab:

S:\>dump test.bin
test.bin:
00000000 182d 4454 fb21 0940
^^
The program cat expands this perceived tab into spaces.

Izaak_Beekman · ‎07-05-2010

I have no idea what system you guys are on but on GNU Linux cat is binary safe. A few preliminary tests show no difference in the data stream after running through cat:

[bash]04:05 PM (1) ~ $ od -b test.bin 
0000000 030 055 104 124 373 041 011 100
0000010
04:07 PM (1) ~ $ cat test.bin | od -b
0000000 030 055 104 124 373 041 011 100
0000010
04:07 PM (1) ~ $ md5sum test.bin
acd5d6588d65420cb64a22e37b888aac  test.bin
04:09 PM (1) ~ $ cat test.bin | md5sum
acd5d6588d65420cb64a22e37b888aac  -
[/bash]

This isDEFINITELYnot the problem. Maybe cat on windows gives you garbage but not on GNU Linux.

-Z

jimdempseyatthecove · ‎07-05-2010

Try running a tab character 009 through it (and not on tab boundry -1)
e.g.change your 2nd byte (055) to 009

Jim

mecej4 · ‎07-06-2010

Previously, we did not know what OS you were on. We still don't know which version of IFort you are using.

I cannot reproduce the error using Cygwin (for cat -- no tab expansions on Cygwin either) and IFort 11.1.

The IOSTAT value of 38 stands for 'ERROR DURING WRITE' on my system. If you remove the IOSTAT= options in your I/O statements and run with -traceback enabled, the compiler will display the error text. With that, you may be able to find out what is causing the problem.

Izaak_Beekman · ‎07-06-2010

I am running ifort 11.1 on x86_64 RHEL and also ifort 11.1 on 32 bit Ubuntu. The error is an 'Illegal seek' error (whatever that is) as stated in the original post.

jimdempseyatthecove · ‎07-06-2010

>>The error is an 'Illegal seek' error

If I were to guess, the pipe (|) on your system is implemented as an actual pipe, and not implimented as a temp file.

When implemented as a temp file, the lhs of pipe runs to completion creating a file prior to running the rhs of the pipe, and then attaches the file to stdin, being an actual file, fseek can be performed.

When implemented as an actual pipe, the lhs can run concurrently with the rhs (a little in advance of the rhs) producing a continuous stream of data (until program termination or pipe close function). Actual pipe data cannot be fseek'd due to it requiring the lhs of pipe to restart / run backwards.

If you must use pipe, then pipe to program that creates a file, followed by < of file into your program.

Jim Dempsey

Izaak_Beekman · ‎07-06-2010

I'm notsurewhat fseek is (sounds like a cfunction), but my intention is to just read the data from the pipe in sequential order using portable Fortran. You will note that the file is opened as unformatted, stream access. Is there really no way to do this with Fortran? Are we sure this isn't an issue with the compiler? (After all stream access is F2003 and my track record with ifort 11.1 and F2003 features is dicey at best.) There are nice performance benefits to using a pipe as a read ahead mechanism rather than first writing to a file then reading the file.

mecej4 · ‎07-06-2010

fseek is in the standard C library.

I have taken a more careful look at your code, and I see some problems that I overlooked earlier.

1. You closed the stdin stream and reopened the input_unit afterwards. Once the stdin stream is closed, there is no more connection to the data input (either through input redirection or using cat and a pipe). According to my reading of Metcalf, Reid and Cohen, Sec. 10.3, the subsequent open, without a file= clause, may look for the file fort.5 for input data.

2. Whether the standard input can be reopened with the access=stream option is implementation dependent. The Sun Fortran, run on this shortened version of your program

[fortran]program tester
  use iso_fortran_env
  implicit none
  double precision :: pi

! close(input_unit)
open(unit=input_unit,access='stream',form='unformatted')
read(input_unit) pi
write(output_unit,*) pi

end program
[/fortran]

gives the run-time error

[bash] Error 1152:  specifier ACCESS='STREAM' for default unit
 Location:  the OPEN statement at line 7 of "zbo.f90"
 Unit:  5
Aborted
[/bash]

I wonder whether Intel Fortran has similar limitations, but with an implementation that fails to report the violation.

I agree with the usefulness of what you want to do. I am just not sure that Fortran, burdened as it is with the need to support formatted/unformatted and stream/record I/O, can also handle the C standard I/O paradigms coupled with Unix redirection and pipe mechanisms.

Izaak_Beekman · ‎07-07-2010

Yeah thanks for looking at this mecej4. I might do some more experimentation. As I said, this works if I use sequential, record based unformatted input, but fails with stream access. I wonder if I use the inquire function to get the name of the terminal device (usually /dev/tty#) and open it as unformatted, stream access, rather than closing and reopening stdin if I will get better results. Or maybe I can create a named FIFO and open that as stream access. Any way, if anyone (especially you folks over at Intel) has any aditional wisdom I would appreciate it. When I get a chance to run some tests I'll psot my findings here.

-Z

mecej4 · ‎07-07-2010

Why not handle the file manipulation at the OS-shell level, and give the Fortran program a fixed name, as in:

[bash]ln -s fort.bin fort.5
./a.out
[/bash]

and leave out the close() at the beginning of your program? Once the tie of unit-5 to stdin is severed, you should have no trouble opening unit-5 (or any nn) for unformatted stream input. I have checked that this works correctly.

IFort uses the implicit name fort.nn if unit nn is opened without a FILE= option. I realize that this may not be a portable solution, but it may be enough for the present task.

jimdempseyatthecove · ‎07-10-2010

>>I'm notsurewhat fseek is (sounds like a cfunction),

Although your code does not explicitly have an fseek (C function) the FORTRAN OPEN may very well perform an fseek. Psudo code

CallCfunctionToOpenHandleToFile()
IF(FortranFileTypeMightNeedToDetermineRecordHeaders) then
readAlittleBitOfTheFile()
DetermineHeaderInfoOrLineTermination
fseek(0) ! return to front of file
ENDIF

While the above will work if the data comes from an actual file, it will not work if the input is from an actual pipe (process that is directly supplying data via intermediary buffer) as opposed to file simulating a pipe.

Jim Dempsey

joseph-krahn · ‎07-12-2010

When you redirect using "<" or ">", your input is actually connected to the named file. You might be able to INQUIRE for the file name and get the name stdin was redirected from, or the name stdout is redirected to, and possible even "CLOSE(STATUS='DELETE')" depending on the compiler/OS. Pipes are not seekable; they are a one-way stream.

If no input has been read, there is really no need for a seek operation. it may depend on I/O buffers, and whether the OS accepts a seek-to-zero on a PIPE that is already at zero. The same difference for streams versus redirect should be visible with the REWIND command.

I tested the INQUIRE and CLOSE(STATUS='DELETE'), and they don't access the actual filename that you would get using C. Intel Fortran gives the filename "stdin". GFortran also gives "stdin", but it changes to "fort.5" when you close and reopen.

It would be nice to support a REOPEN for a connected file that does not yet have any I/O done. Maybe STATUS='REOPEN'?

-----------------
Adding this to the post after some testing:

In Linux, a seek to zero on a pipe is an error, even if the pipe is already at zero. So, intel Fortran could work around thid problem by bypassing the seek operation if the current position is already valid. However, this will never work in GFortran, because reopening INPUT_UNIT does not reconnect to the original stdin stream. So, unless Fortran adds a REOPEN feature for this purpose, it is best to avoid trying to reopen standard I/O streams.

jimdempseyatthecove · ‎07-13-2010

>>In Linux, a seek to zero on a pipe is an error, even if the pipe is already at zero.

Joe, thanks for running this test and reporting back to this thread. I was making an educated guess when I suggested that the error was due to the OP's system using an actual pipe (as opposed to simulated pipe with file) and that the pipe would report error on seek.

RE: reopen

If you look at the C/C++ Preprocessor programming model for macros (#define foo...) they have a provision for you to "push" the current content of the macro onto a stack, then you are free to (temporarily) redefine the macro and use it, then later "pop" the macro back into place. This can be done without knowing what was formerly in the macro, and can be nested to any level. BTW this reminds me of the TECO pushand pop macro facility.

The REOPEN could essentialy do the same under different name

OPEN(u,fs,...,'PUSH=YES') ! push whatever is on unit u then open)
...
CLOSE(u,...,'POP=YES') ! close unit u and pop top of stack

Note, you may want to have one of or a selection of:

one general open file stack
each unit with open file stack
n open file stacks

When a file is pushed onto the stack it would not be closed. Essentially the file handle (and state) is pushed onto a stack. When a file is poped, it is not reopened, rather the file handle comes off the stack and is associated with the unit. Note, additional non-handle information used internaly by Fortran would need to get pushed along with the handle (i.e. file state).

With this technique, you should be able to push/pop files with pending asynchronous I/O.

Jim Dempsey

joseph-krahn · ‎07-13-2010

Quoting jimdempseyatthecove

RE: reopen

If you look at the C/C++ Preprocessor programming model for macros (#define foo...) they have a provision for you to "push" the current content of the macro onto a stack, then you are free to (temporarily) redefine the macro and use it, then later "pop" the macro back into place. This can be done without knowing what was formerly in the macro, and can be nested to any level. BTW this reminds me of the TECO pushand pop macro facility.

The REOPEN could essentialy do the same under different name

OPEN(u,fs,...,'PUSH=YES') ! push whatever is on unit u then open)
...
CLOSE(u,...,'POP=YES') ! close unit u and pop top of stack

...

That would not solve the problem posed here, which is to re-open an existing file-descriptor with new attributes, without disconnecting it. By file-descriptor, I mean the low-level system identifier, not the Fortran logical unit.

A push/pop system involves connecting the same logical unit number to different un-closed file-descriptors, but does not address changing the I/O properties of an existing file-descriptor. Also, push/pop is easy to implement using a stack of LUNs and just modify the variable holding the LUN when pushing or popping.

NOTE: I think it is a Standards violation for Intel and Gnu compilers to return 'stdin' for INQUIRE on INPUT_UNIT. The standards state "... the value returned shall be suitable for use as the value of the
file-name-expr in the FILE= specifier in an OPEN statement." Obviously, "stdin" does not work. The unit should be unnamed if a real filename cannot be returned.

In Linux, you can call the C function 'readlink' on '/proc/self/fd/0' to get the actual stdin filename, then open that file by name.

joseph-krahn · ‎07-13-2010

I discovered a problem with Intel Fortran: STREAM I/O does not work on a pipe, even if it is not stdin. I created a named pipe (mkfifo) and opened that file using access='STREAM'. It gives the same seek error. Intel Fortran needs to avoid calling seek when it is not needed.

Izaak_Beekman · ‎07-13-2010

Yeah, it would be nice if someone from Intel bothered reading this, or chiming in with their two cents. It seems as though reading from a pipe with stream access should be a perfectly reasonable thing to do. It certainly does not work as of v 11.1.

joseph-krahn · ‎07-13-2010

I submitted a bug report for this. Hopefully it is a simple enough fix to make it into the next release.

jimdempseyatthecove · ‎07-13-2010

>>Also, push/pop is easy to implement using a stack of LUNs and just modify the variable holding the LUN when pushing or popping

Too many old programs hard wire the LUN

WRITE(6,...

When this is buried in a 3rd party DLL, you could temporarily redirect the I/O (O) away from what was on 6 using the PUSH (open file handle/info...) and POP back. You are unable to modify the code to use

WRITE(PRTLUN, ...

or whatever and PUSH/POP unit numbers in an array (stack).

Changing the attributes on a file after it is open is a completely different issue (e.g.CHMOD), but changing an open for input file to an open for output fileis out of the scope of standard Fortran (not that this couldn't be done too). You might want to extend the capability to have two file pointers (one for reading and one for writing). Any kind of goofy features. (Good Or Otherwise Fine Yacking).

Jim

Kevin_D_Intel · ‎07-14-2010

As Joe indicates, he submitted a Premier bug report which I submitted to Development (see internal tracking # below). I reproduced zbeekman's original issue using a pipe (which appears should work) and included it in the same report to Development. I will keep this post updated with new information as I learn it.

(Internal tracking id: DPD200158026)

(Resolution Update on 12/07/2010): This defect is fixed in the Intel Fortran Composer XE 2011 Update 1 release (12.0.1.107 - Linux)