Random reading of an unformatted file?

michael_green · ‎03-26-2015

Hi All,

I have a large file of geographic information created thus:

open(1,file=trim(shapefile),status='replace',form='unformatted',iostat=ios, &
   err=1000,recl=128,recordtype='stream')

I read the whole file and display its information on screen. The user clicks somewhere on screen and from that I know I have to re-sample a small part of the file starting at some byte N. Is there some Fortran way I can get at that byte number immediately without having to start from the beginning of the file?

Many thanks

Mike

John_Campbell · ‎03-27-2015

If you open the file as form='UNFORMATTED', access='DIRECT', RECL=128, but not use recordtype='STREAM' then you can treat it as a word addressable file. All you need to do is have a 128 word buffer and use this to access the appropriate records.

If the word address you want is n, then:
the word is in file record number = 1+(n-1)/128, and
the word address in this record is mod(n-1,128)+1
You can easily write routines "get_word" or "put_word" to manage these transfers and utilise the active buffer.

For ifort unformatted direct access files, a "word" is 4 bytes by default, so you may need to adjust for this.

Direct access is ideally suited to random access of large data sets.

I am not familiar with the use of recordtype='STREAM' to know if this can be used or assists with the file access, but a standard Fortran direct access file can be easily used, especially if you access it via get_word or put_word. You could easily extend these routines to cope with multiple words, such as:

subroutine get_nwords ( unit, array, num, file_word_address, num_read), which would return num_read words into array, from a position starting at file_word_address in the direct access file open on unit. (num_read is effectively the error return status and should = num)

You could also experiment with the record length. While a record length of 1 simplifies to the address is the record number, longer records do offer some buffering. I typically use a record length of 64k bytes for my random access word addressable file library, which I originally wrote to emulate the CDC 6600 word addressable random I/O library. I've found this approach to be very useful ever since.

John

Arjen_Markus · ‎03-27-2015

Actually, stream access allows you to read the file randomly, but independent of any record. Thus it allows you greater freedom than direct access. One issue with direct access is that the unit of record length may be "byte" but it may also be "word", depending on the compiler and compiler options. If your data are organised in chunks of the same size, then direct access is possibly more convenient.

Steven_L_Intel1 · ‎03-27-2015

This is what POS= in a READ statement for a STREAM file is for. Open it for sequential access, not direct.

John_Campbell · ‎03-27-2015

Steve,

Open it for sequential access, not direct.

This is a strange response for a request of how to randomly access a file. If this is the case then the capability of Access='DIRECT' has certainly been neglected in ifort's implementation, if a sequential access construct is preferred.

For pos=n, is it the word or byte address and are Integer*8 values permitted ?

John

Steven_L_Intel1 · ‎03-28-2015

pos= uses byte addresses. I should have said "stream" access, not sequential. Any integer kind is accepted.

John_Campbell · ‎03-29-2015

Steve,

Thanks for the clarification. By using access='STREAM' and then using INQUIRE (Unit=lu, POS=p) to create the record address table, this allows all the flexibility of a variable length record, random access file, where all buffering is being managed by ifort or Windows. Is this standard conforming or an ifort Fortran extension ? Also, is there any other file OPEN specifiers which effect the efficiency of randomly accessing a large (say 32 gb) file ? This was an issue with older O/S (Pr1me and Vax) (and much smaller files!) but does not appear to be discussed with Windows.

This is much more flexible than the access='DIRECT' random access file structure provided in Standard Fortran.

Thanks again for the clarification,

John

andrew_4619 · ‎03-29-2015

POS on read and Inquire is standard.

As per your comments above I agree half the features in standard Fortran I/O are redundant in new applications if you use some of the newer stuff like "stream".