I've never quite understood the rules for derived type component alignment requirements as it pertains to binary/stream IO of derived types. When if ever are padding bytes included in the write of a derived type as written to a file? For consistency, should I always write derived types component by component? I often structure my derived types with a small reserve character buffer at the end that I adjust in size as I add or adjust components internally. For example I may have 1024 bytes reserved at the end. I can then add a 4 byte integer at the end of used space and subtract 4 bytes from the reserve buffer and keep the IO "the same". However, I find that sometimes things go wrong and I find zero fields where I don't expect it (even without an alignment error/warning by the compiler). Is that a padding byte or is it more likely my own coding error? In reality, I should design a better data stream, but dumping structures is rife in windows programming it seems and it is easy (lazy).
Padding is never added during I/O. However, derived types by default have padding added to naturally align their components. If you can provide a short example demonstrating the problem we can examine it further.
Yes, I understand that padding is added in memory. The question was is that padding that I know is added in memory ever included on a write operation to disk (I could take your response as implying that I thought it was added only on the write operation). I think you meant it is never included in a write to disk even if present in memory. If that is the case, then I obviously have a coding error.
There is a parallel thread on this at comp.lang.fortran. The 'what' and 'how' of the issues faced by OP as they relate to Intel Fortran compiler-specific details of structure alignment and padding, component reordering (if applicable), and unformatted IO are still unclear. It makes it even more difficult to understand what's going in without an actual example.
So I decided to run a little test and I didn't notice any issues notwithstanding the caution and limitations that accompany this in terms of processor-dependent aspects and portability across platforms and compilers. Here's the code if anyone wants to review and comment, particularly Steve or from the Intel Fortran team:
module t_m use, intrinsic :: iso_c_binding, only : c_loc, c_ptr implicit none private type, public :: t integer :: i = 0 character(len=13) :: s = "" real :: r = 0 end type t public :: read_t public :: write_t public :: get_offsets contains subroutine read_t( this, filename, istat, imsg ) ! argument list type(t), intent(inout) :: this character(len=*), intent(in) :: filename integer, intent(inout) :: istat character(len=*), intent(inout) :: imsg ! local variables integer :: lun open( newunit=lun, file=filename, access="stream", form="unformatted", action="read", & status="old", iostat=istat, iomsg=imsg ) if ( istat == 0 ) then read( unit=lun, iostat=istat, iomsg=imsg ) this close( unit= lun ) end if return end subroutine read_t subroutine write_t( this, filename, istat, imsg ) ! argument list type(t), intent(in) :: this character(len=*), intent(in) :: filename integer, intent(inout) :: istat character(len=*), intent(inout) :: imsg ! local variables integer :: lun open( newunit=lun, file=filename, access="stream", form="unformatted", action="write", & status="replace", iostat=istat, iomsg=imsg ) if ( istat == 0 ) then write( unit=lun, iostat=istat, iomsg=imsg ) this close( unit= lun ) end if return end subroutine write_t subroutine get_offsets( this, offsets ) ! argument list type(t), intent(in), target :: this integer, intent(inout) :: offsets(:) ! local variables type(c_ptr) :: address integer :: base_address address = c_loc( this%i ) base_address = transfer( source=address, mold=base_address ) offsets(1) = 0 address = c_loc( this%s ) offsets(2) = transfer( source=address, mold=base_address ) - base_address address = c_loc( this%r ) offsets(3) = transfer( source=address, mold=base_address ) - base_address return end subroutine get_offsets end module t_m
program p use, intrinsic :: iso_fortran_env, only : character_storage_size use t_m, only : t, read_t, write_t, get_offsets type(t) :: foo type(t) :: bar integer :: offsets(3) character(len=*), parameter :: datfile="C:\temp\foo.dat" integer :: istat character(len=256) :: imsg integer :: sizefile !.. foo = t( i=42, s="Hello World!", r=99.0 ) call write_t( this=foo, filename=datfile, istat=istat, imsg=imsg ) if ( istat /= 0 ) then print *, "write failed: istat = ", istat, ", imsg = ", imsg stop end if call get_offsets( this=foo, offsets=offsets ) print *, "components of foo are at ", offsets ! call read_t( this=bar, filename=datfile, istat=istat, imsg=imsg ) if ( istat /= 0 ) then print *, "read failed: istat = ", istat, ", imsg = ", imsg stop end if call get_offsets( this=bar, offsets=offsets ) print *, "components of bar are at ", offsets print *, "bar%i = ", bar%i, "; expected value is ", foo%i print *, "bar%s = ", bar%s, "; expected value is ", foo%s print *, "bar%r = ", bar%r, "; expected value is ", foo%r inquire( file=datfile, size=sizefile, iostat=istat ) if ( istat == 0 ) then print *, "file size for IO: ", sizefile, " file storage units." print *, "size of foo: ", storage_size(foo)/character_storage_size, " bytes." print *, "size of bar: ", storage_size(bar)/character_storage_size, " bytes." end if stop end program p
With Intel Fortran compiler options of /standard-semantics /warn:all /check:all /stand, the above code compiles without any warnings or errors. Upon execution, the output is:
components of foo are at 0 4 20 components of bar are at 0 4 20 bar%i = 42 ; expected value is 42 bar%s = Hello World! ; expected value is Hello World! bar%r = 99.00000 ; expected value is 99.00000 file size for IO: 24 file storage units. size of foo: 24 bytes. size of bar: 24 bytes.
Note by default, Intel Fortran aligns derived type components at 8 byte boundaries and thus the memory location offsets are 0, 4, and 20.
Now consider the following:
But in all the above cases, the unformatted read action on the file is ok and the object ("bar") is deserialized as expected.
Gary suggested that padding was being added at the point of the I/O, which is absolutely not happening. As I said earlier, components do get padding for natural alignment by default in Intel Fortran. You can disable this by 1) Making the type a BIND(C) type, 2) Specifying /noalign when compiling, 3) bracket the type declaration with !DIR$ OPTIONS /NOALIGN ... !DIR$ END OPTIONS, 4) Make the type a SEQUENCE type (not recommended)
No, I never suggested that it was added during (at the time of) the IO process. I asked whether padding that is already present in the structure in memory is included in an IO operation when a whole DT reference is made. Some other compilers docs indicates that it is, some that it is not. If not for IVF, then the answer was more likely that the components were reordered (non-sequence type). My inspection of the binary file content (and the program that normally reads it failed) was that two zero bytes appeared between two initialized character variables. The file must be compatible across compiler versions (e.g. created with 11.1) and regardless of compile time options that impact order (and/or padding), therefore I've redesigned all IO to explicitly specify the component list in the preferred order. This will eliminate these incompatibilities once and for all. Yes, I should have known better, but the pressure of producing something quickly led to a poor design.
Gary, many of your comments strike me as being ambiguous and confusing. Some sentences defy my attempts to parse them (e.g. "I could take your response as implying that I thought it was added only on the write operation" in #3). On the other hand, Steve's statements are quite clear and informative.
This is just my reaction to what has been said in this thread and the companion CLF thread. I do not seek to start an argument. If you do not agree with my observations, please ignore them.
As I said, an unformatted WRITE of a derived type value is treated as a single entity, not as a series of components. If that entity has padding in its storage representation, those pad bytes are included. A formatted WRITE is different, but you asked about "binary/stream I/O" by which I understand you to be asking about unformatted stream I/O.
How applicable is the code provided in Message #5 to whatever one can reasonably infer about the issues as commented by OP?
OP has also brought up component reordering. Per my understanding and as shown with the code in Message #5 and also based on Steve's comments in Message #4 (anonymous block of bytes) and #9 ( single entity), component reordering is also a non-issue with writing and reading a derived type using unformatted and say stream IO. Is this accurate?
Any component reordering (which Intel Fortran doesn't do but the standard allows for non-SEQUENCE and non-interoperable types) happens when the compiler lays out the derived type in memory. The compiler never inserts padding or reorders components during execution.
Since OP has yet to provide a test case, we can speculate until the cows come home about what the problem might really be. I suspect in the end it will prove to be a coding error.
There is no need for a code sample. I merely asked for clarification as to how IVF actually behaves. I made no accusation that IVF was doing anything wrong. I know I have a coding error, that was the point of the question, to clarify the best approach for fixing the coding error. First you must understand the cause...I was also not trying to be argumentative, for heaven's sake.
Your very first statement "Padding is never added during I/O." was not what I was expecting and confused me. I never even considered that padding might be added only during the IO operation if it wasn't present in memory. My going in theory was that it was added if it was included in memory (for steam io) and I just wanted confirmation.
The following statement is what I expected to see:
"an unformatted WRITE of a derived type value is treated as a single entity, not as a series of components. If that entity has padding in its storage representation, those pad bytes are included. A formatted WRITE is different, but you asked about "binary/stream I/O" by which I understand you to be asking about unformatted stream I/O."
So, now I fully understand. Thank you all.
OK, but now there's a little ambiguity from my point of view: in memory, at least if the UDT have the BIND property, it will get aligned according to the requirements of its most restrictive component. During unformatted I/O, if the previous operation left the file at some weird alignment, as by outputting a character variable with an odd LEN, do padding bytes get inserted before the structure to align it or is the whole structure written to the file misaligned?
Repeat Offender wrote:
if the previous operation left the file at some weird alignment, as by outputting a character variable with an odd LEN, do padding bytes get inserted before the structure to align it or is the whole structure written to the file misaligned?
Padding is never inserted during I/O. FIles don't care about alignment.
R.O., You don't think that Fortran goes that deep into the storage architecture, do you? -- deeper than the level of the user's view of the file system, which is as a collection of bytes?
You would have to look into the depths of the filesystem and HDD/SSD controllers to see alignment and padding issues. For example, the "wear-leveling" strategy used on SSDs.