I/O problems with binary out-of-core file

lrcs · ‎01-24-2011

We have a strange I/O problem and are hoping someone out there has information that will help us. Unfortunately, we havent succeeded in making a small reproducer yet. We are hoping that someone has an idea on what the problem really is, which would help us make a small reproducer to test/debug with. Ideally, we would like to produce a small reproducer which could be posted to this forum for feedback and as a reference to others. We wonder if we are encountering a compiler / system bug (and are wondering if anyone else has experience similar issues), but dont feel comfortable concluding this is the case until we can present a small reproducer.

Synopsis of problem: Data written does not match data that is read later. In one particular instance, the data that was read later was missing a contiguous chunk of data from the middle of what should have been written, but the length of the dropped segment did not seem to be any auspicious number (not a multiple of 512 or anything.) In another case, the data was written to the wrong location. These IO problems are rare (only happen on a few test cases) but this represents a serious issue as the program results must be reliable. Unfortunately the affected program is very large and has proprietary elements, so it is not suitable for posting in the aggregate.

Some excerpts of the IO-related code are listed at the end, for those of you who want to jump there.

Operating systems: Red Hat and SUSE

Fortran compilers: Intel versions 10.1 and 11.1

Program characteristics that I suspect you experts care about:

1. Parallel program using MPI (MPICH 2 or SGI MPT, problem was recreated with both). Would this introduce additional complexity (multithreading, etc.) that the ifort compiler does not expect?

2. Each process has its own dedicated OOC file (so parallel access should not be an issue).

3. (Perhaps this is key to problem just a hunch) reads and writes occur from contained routines within recursively called subroutines.

4. It happens most often when the interior of the file is written to, but has also (very rarely) happened when the file is only being appended to. It happens much more often when data is written to the same location twice in rapid succession (which we did while trying to make a smaller reproducer).

5. We do not have this problem when we use C for the IO instead. (We were hoping to stay with pure Fortran, but may have to use C too if we cant solve this).

Other Observations: Making small changes related to I/O changes the instantiation of the bug. For example, adding a flush after the write eliminated the problem from some test cases, but not from others. Changing the BUFFERCOUNT in the file open() routine, either to a smaller or larger integer, can mask or unmask this problem.

=== IO-related code excerpts ===

Example of open

open(OOCUnit, FILE=OOCFileName, FORM='UNFORMATTED', &

ACCESS='STREAM', ACTION='READWRITE', STATUS='OLD', &

BUFFERED='YES', BUFFERCOUNT=BufferCount)

Example of write

write(OOCUnit, POS=OOCPos) ContiguousMData

Example of read

read(OOCUnit, POS=OOCPos) F%Matrix%MData

Ron_Green · ‎01-24-2011

Is this to a local disk system or a shared network file system. We have had some issues over the years that were Lustre related, or when other parallel file systems had disabled locking or synchronization.

If it's a parallel or NFS filesystem, I'd try redirecting the IO to local disks and see if the problem persists.

ron

jimdempseyatthecove · ‎01-24-2011

>>2. Each process has its own dedicated OOC file (so parallel access should not be an issue).

Parallel access within the same process may be an issue.

In addition to MPI, are you parallel programming each process? If so, you may need a critical section around your write data blob to file. (Not just a critical section aroung the WRITE, since multiple WRITEs may be required to output each blob.)

Jim Dempsey

lrcs · ‎01-25-2011

It is occuring on several different file systems, some parallel, some local, so we're pretty sure it isn't file system related.But I'm makingnote of the parallel file system issues you've seen - I sure hope we don't have two sets of causes in play.

lrcs · ‎01-25-2011

No, we aren't doing any parallelization other that MPI ... yet. Thank you for pointing out what may become a future issue for us, as we do intend to explore additional parallelization techniques.

jimdempseyatthecove · ‎01-25-2011

>>
Example of write

write(OOCUnit, POS=OOCPos) ContiguousMData

Example of read

read(OOCUnit, POS=OOCPos) F%Matrix%MData
<<

Can you eliminate the POS=OOCPos?

If not, then insert diagnostic sanity checks to assert OOCPos is correct
i.e. are the sequence of the OOCPos on writes the same as for reads?

Jim Dempsey

haraldkl · ‎02-18-2011

I got a similar problem, with unformatted stream access and positioning.

I am trying to read a single file on multiple MPI processes without overlaps.

Each MPI process has its own offset put into the pos argument, which is computed with the help

of inquire(iolength), thus it should be correct (if I understand the standard correctly). The program also executes fine, when compiledwith gfortran but fails with ifort.

I can work around this by using direct IO instead, but this results in many small reads, as the chunk size read by each process from the file may vary.

jimdempseyatthecove · ‎02-18-2011

>>Each MPI process has its own offset put into the pos argument, which is computed with the help

of inquire(iolength), thus it should be correct (if I understand the standard correctly).

Add diagnostic code to verify the assumption "thus it should be correct".

When (if) you find something incorrect, then this may point to a programming problem or a bug.

Note, if I understand what you are trying to do, each MPI process will inquire(iolength...) for the record sizes of the other MPI processes (at least those which may have data preceeding the current MPI process's data in the file). Also pay attention to the value of the IOLENGTH unit size (may be 4 bytes or 1 byte or??). All processes must be using the same IOLENGTH unit size values. And RECL= on the OPEN may interfere with the position assumption made with inquire(iolength...

For a formatted file, the file storage unit is an eight-bit byte. For an unformatted file, the file storage unit is an eight-bit byte (if option assume byterecl is specified) or a 32-bit word (if option assume nobyterecl, the default, is specified).

Depending on other factors, assert that all MPI processes are using the same POS unit size.

Jim