BUG in Fortran I/O (?)

johnculp · ‎03-03-2008

Hi Dr. Fortran;

Maybe there is a logical explanation for what's going on here.

I read a recent thread between you and BLUEYEDTAM about reading sequential data from a binary file. I have a similar problem, so I tried your suggestion.

Suppose we have:

type
chk3 ! data chunk (has actual waveform samples)

integer*4 hdr

integer*4 ndbyte

end type

type (chk3)::chunk3

! execution - OPEN INPUT .WAV FILE

open(3,file=fn,readonly,form='binary',convert='big_endian')

Now with this READ statement, we get the right results when I print out chk3.hdr and chk3.ndbyte.

It is supposed to read in 8 bytes from unit 3, two integers in reverse order:

read(3)chk3

BUT - - If I do it this way:

read(3)chk3.hdr,chk3.ndbyte

I get theWRONG values for chk3.hdr and chk3.ndbyte

It acts like it's skipping over data bytes.

In this second case, I'm also reading two values of 4 bytes each, so why would the results be any different? Isn't the idea to read in sequential bytes without skipping any data?

Thanks, John

Steven_L_Intel1 · ‎03-03-2008

I'm somewhat baffled that you get the "right" results when you read the whole structure (I assume you meant chunk3 in the read and not chk3, though this error leads me to wonder if you're accurately describing the problem.) As documented, CONVERT= is ignored when a derived type/structure variable is read. Could it be that the data isn't big-endian after all?

What I would like to see is a hex dump of the first few records of the file. One way you can see this is to open the data file in Visual Studio. And then a short but complete program that reads a record or two and prints the hex values of the data read.

My guess is that the actual behavior is different from your description. Ignoring the variable name error, the code you posted would read the first 8 bytes "raw" when chunk3 was used, and two big-endian 4-byte integers, byte-swapped on input, when the components were used.

What wrote this data file? Was it a Fortran program? On which platform? Or is it a Windows-style audio .wav?

P.S. Can I convince you to use % instead of . as a field separator when using derived types?

johnculp · ‎03-03-2008

Hi again;

You're right, it was chunk3 in the actual program, not chk3.

When I read in the entire record at once, it does the proper conversion for all the data fields within it.

As for the documentation, I looked under the OPEN statement under CONVERT, but there is no mention of the usage within data records, i.e. with"." or % fields. So, where is the documentation you arereferring to? Are you saying that it can't doany conversions under those conditions?

Wouldn't that kill the usefulness of having record structures?

I'll generate a test case, to try to pinpoint the bevahior I'm referring to.

johnculp · ‎03-03-2008

I spent a few hours generating a special test case. This console application writes out a specified number of records, and reads them back in three different ways, and compares the results.The first test passes, but the other two tests fail for every single value put into the records.

The documentation says that "binary files don't have any internal structure." I take that to mean that the length of each record is merely the sum of thelength of its data elements.

Anyway, this should underscore that the Fortran I/O is not consistent with itself. The original file we are processing is a WAV file, and they specify that the values are written out in reverse order. Curiously enough, the tests fail whether the conversionsare "big_endian" or "little_endian."

I suspect what's happening is that record pointers somehow get corrupted when I read the data elements in one at a time. It shouldn't matter how they're read in, whether one at a time, or the whole record at once..

johnculp · ‎03-04-2008

I did a little more research on this - - -

If you look at the hex dump I generated, you'll see that the output records are padded with zeros, but not necessarily on the end. I at first thought that it was trying to make the length divisible by 8. But when I fixed the record length to be exactly 48 bytes, it stuck 16 extra zeros in each record, making the length 64 bytes. So apparently we don't have control over the length of each record.

So apparently the only way to guarantee that you can read the data back in the way it was written, is to have the exact same record sizes, and the same record structure.

It seems that putting in these random artifacts defeats the purpose of having a binary file. Usually when someone writes this type of file, they want a specific structure, to be compatible with someone else's usage. For instance if I were writing a WAV file this way, I would not want the zero padding anywhere. Likewise, when I read in this type of file, I don't want it to skip around in some unpredictable manner.

Is there a way to suppress this zero padding?

Maybe this is all discussed somewhere, but I couldn't find it. Whether the conversion is "big endian" or "little endian" makes no difference, I get the same errors.

Steven_L_Intel1 · ‎03-04-2008

What you want is a SEQUENCE type. Add the line SEQUENCE after the "type recs" statement. Without that the compiler is free to add padding between misaligned components.

GVautier · ‎03-05-2008

Hello

The wav file format is based on variable length records.

The first record must be declared as below :

type wave_file_header

    character(4) riff

  integer*4 size1
end type

So you can read it simply with :

type (wave_file_header) :: wfh
open(1,file="filename.wav",access="direct")
read(1)wfh

For a complet description see :

http://www.sonicspot.com/guide/wavefiles.html

johnculp · ‎03-05-2008

Thanks to everyone for their help, especially Dr. Fortran and Gvautier.

The SEQUENCE statement solved the problem.

Here is the way I type the three main records of the input WAV file. This is not a complete set - I focused on the record types most likely to be encountered in Windows/XP systems.

You could make the SAMPLESarray smaller, then read them in sequentially if desired.

See attached file - - -

GVautier · ‎03-05-2008

For the last one (DATA chunk), you must not integrate the samples in it.

Declare and use as follow :

! DATA chunk
    type t_chk3 ! data chunk (has actual waveform samples)
      sequence
      integer*4 hdr
      integer*4 ndbyte
    end type

....
type (t_chk3) :: chk3
integer*2 samp2[allocatable](:)
character samp1[allocatable](:)

....
read(nnn)chk3
if (chk2.bytes_per_sample.eq.2) then
    allocate(samp2(chk3.ndbyte/2))
    read(nnn)samp2
else
    allocate(samp2(chk3.ndbyte))
    read(nnn)samp2
endif

G. Vautier

johnculp · ‎03-08-2008

Sure, I understand the need to disinguish between stereo and mono samples.

What I do nowis read everything intoONE array, and for the stereo WAV files, I just pick out the alternate samples, odd going to one channel, and even going to the other.

Seems to work OK so far, but maybe your approach works just as well, or better.

I need to bone up on how to use the ALLOCATE statement.

Yours; John

GVautier · ‎03-09-2008

Hello

The problem is not for stereo files but for the range of wav sampling. It may be in1 byte (-128 to 127) or in 2 bytes. The 2 bytes sampling is the most common way.

If the file is a stereo on, you must proceed like this :

type t_stereo1
   sequence
   character left,right
end type

type t_stereo2
   sequence
   integer*2 left,right
end type


....
type (t_chk3) :: chk3
type (t_stereo1) :: samp1[allocatable](:)
type (t_stereo2) :: samp2[allocatable](:)

....
read(nnn)chk3
if (chk2.bytes_per_sample.eq.2) then
    allocate(samp2(chk3.ndbyte/2))
    read(nnn)samp2
else
    allocate(samp2(chk3.ndbyte))
    read(nnn)samp2
endif

G. Vautier