Solved: Sequential file structure for large records.

John_Campbell · ‎11-27-2022

I have been investigating the use of stream I/O for creating a variable length record direct access file structure. It appears to be very effective.

In doing this I have also investigated various formats for the header/footer of records.

For records larger than 2^31 bytes, I find the approach by iFort and gFortran to be a complex mess.

First question : Why is the ifort sub-record length 2,147,483,639 bytes (2^31-9) ?

Why -9 ! Based on Fortran data types, wouldn't a multiple of 2 (at leaset 16) be better, especially when merging sub-records. It just looks inefficient for memory transfers, or is efficiecy irrelevant in this case. The combining of these sub records of odd byte length looks to be an unnecessary complication.

Perhaps, rather than -9, -4096 bytes could have been a better choice, ie the sub record would be a multiple of memory pages (assuming 4kbyte page size) to produce 2^19-1 memory pages to a sub record.

Of interest is Silverfrost FTN95 which has a 1-byte else 5-byte header/footer for smaller records. This suggests an alternative format for larger ifort unformatted records could be 4-bytes else 12-bytes, ie a first 4-byte of header as -1, followed by an 8-byte length value.

Second Question : Is the reason this approach was not adopted related to possible problems writing data blocks larger than 2^31 bytes, when the supporting I/O libraries were 32-bit ?

Is there a history of discussion of this, as I would be interested in understanding ?

Footnote : Stream I/O appears to offer significant functional benefits for direct addressable files, in comparison to either Fortran direct access or unformatted sequential access files.

Ron_Green · ‎11-29-2022

you didn't say if you were on Windows, Linux or macOS. It would help us to know the OS, and more information on how you are reading and writing these files. Are you using stream IO to read in existing binary file and pulling the data, excluding the record markers? OR are you using stream I/O to create file including record markers that can then be read with unformatted sequential IO?

I talked to our Runtime Library lead on these questions.

2^31-9: Yes, you ONLY see -9 instead of -8 IFF the file is a VMS-friendly file. With VMS compatible files there is an extra byte of record mark (LF) there.

To answer the second question, I'll pass along documentation:

Here’s the diagram for variable-length files, and how sub-records are used when the ‘real’ record length is more than a signed 32-bit integer can represent:
 
/* Read the rest of the bytes from the file directly to the user variable.
**                        
** Edit [1-135]           
**                        
** But do that in record-at-a-time lumps, as the record lengths (LWC's)
** are not user data!     
**                        
** The continuation records have a negative LWC at each end of the
** sequence.  There are positive LWCs on the 'inside' end of the
** leading and trailing sub-records. 
**                        
** Remember that we need to be able to read continuation records going
** either forward and backwards through the file.  I.e, a negative LWC
** means 'this record has a continuation in the next record in the 
** direction you are currently scanning':
**                        
** One record:                            [+LWC data +LWC]          
**                        
** Two records:                  [-LWC data1 +LWC]  [+LWC data2 -LWC]
**                        
** Three:                [-LWC data1 +LWC]  [-LWC data2 -LWC]  [+LWC data2 -LWC]
**                        
** Four:         [-LWC data1 +LWC]  [-LWC data2 -LWC]  [-LWC data3 -LWC]  [+LWC data4 -LWC]
**                        
** Five: [-LWC data1 +LWC]  [-LWC data2 -LWC]  [-LWC data3 -LWC]  [-LWC data4 -LWC]  [+LWC data5 -LWC]

See also the various record types in https://www.cism.ucl.ac.be/Services/Formations/ICS/ics_2013.0.028/composerxe/Documentation/en_US/compiler_f/main_for/GUID-E36C2463-1514-4E4E-B88A-769AB0326C57.htm.

In https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/compiler-reference/data-and-i-o/fortran-i-o/record-length.html, we say the max is “2.147 billion bytes (2,147,483,647 minus the bytes for record overhead). For variable-length sequential records on 64-bit addressable systems, the theoretical maximum record length is about 17,000 gigabytes. When considering very large record sizes, also consider limiting factors such as system virtual memory.”

View solution in original post

Ron_Green · ‎11-29-2022

you didn't say if you were on Windows, Linux or macOS. It would help us to know the OS, and more information on how you are reading and writing these files. Are you using stream IO to read in existing binary file and pulling the data, excluding the record markers? OR are you using stream I/O to create file including record markers that can then be read with unformatted sequential IO?

I talked to our Runtime Library lead on these questions.

2^31-9: Yes, you ONLY see -9 instead of -8 IFF the file is a VMS-friendly file. With VMS compatible files there is an extra byte of record mark (LF) there.

To answer the second question, I'll pass along documentation:

Here’s the diagram for variable-length files, and how sub-records are used when the ‘real’ record length is more than a signed 32-bit integer can represent:
 
/* Read the rest of the bytes from the file directly to the user variable.
**                        
** Edit [1-135]           
**                        
** But do that in record-at-a-time lumps, as the record lengths (LWC's)
** are not user data!     
**                        
** The continuation records have a negative LWC at each end of the
** sequence.  There are positive LWCs on the 'inside' end of the
** leading and trailing sub-records. 
**                        
** Remember that we need to be able to read continuation records going
** either forward and backwards through the file.  I.e, a negative LWC
** means 'this record has a continuation in the next record in the 
** direction you are currently scanning':
**                        
** One record:                            [+LWC data +LWC]          
**                        
** Two records:                  [-LWC data1 +LWC]  [+LWC data2 -LWC]
**                        
** Three:                [-LWC data1 +LWC]  [-LWC data2 -LWC]  [+LWC data2 -LWC]
**                        
** Four:         [-LWC data1 +LWC]  [-LWC data2 -LWC]  [-LWC data3 -LWC]  [+LWC data4 -LWC]
**                        
** Five: [-LWC data1 +LWC]  [-LWC data2 -LWC]  [-LWC data3 -LWC]  [-LWC data4 -LWC]  [+LWC data5 -LWC]

See also the various record types in https://www.cism.ucl.ac.be/Services/Formations/ICS/ics_2013.0.028/composerxe/Documentation/en_US/compiler_f/main_for/GUID-E36C2463-1514-4E4E-B88A-769AB0326C57.htm.

In https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/compiler-reference/data-and-i-o/fortran-i-o/record-length.html, we say the max is “2.147 billion bytes (2,147,483,647 minus the bytes for record overhead). For variable-length sequential records on 64-bit addressable systems, the theoretical maximum record length is about 17,000 gigabytes. When considering very large record sizes, also consider limiting factors such as system virtual memory.”

John_Campbell · ‎11-29-2022

Hi Ron,

Thanks for your answer.

I am trying to replicate unformated sequential binary formats using stream I/O on Windows.

It appears that length [ LWC + data + LWC ] = 2^31-1 = max +ve integer(4) explains the choice of "-9".

The priority appears to be to maximise record length, rather than, say "2^31 -12" to align with 4-byte words, when merging data blocks.

I presume this alignment benefit is not considered significant.

Thanks again for your detailed explanation. It helps understand the approach.

(Note to moderator : Opps I appear to have selected the wrong post as a solution. Not sure how to fix this

Steve_Lionel · ‎11-29-2022

In the past, many Fortran compilers did not support unformatted files with record lengths greater than 2GB because of a 32-bit record length at the start and end of the record. g77 introduced an 8-byte length on 64-bit platforms, but this was incompatible with everyone else. Back in the DEC days, I received a suggestion from an engineer at Sun who proposed a scheme of "segmenting" very large records. He couldn't get Sun to accept it, but we did, and gfortran also adopted the scheme.

John_Campbell · ‎11-29-2022

"incompatible with everyone else" has been a problem with some unformatted sequential file approaches, which can be addressed with access = stream I/O