Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Derived Type Padding on File IO?

garylscott1
Beginner
862 Views

I've never quite understood the rules for derived type component alignment requirements as it pertains to binary/stream IO of derived types.  When if ever are padding bytes included in the write of a derived type as written to a file?  For consistency, should I always write derived types component by component?  I often structure my derived types with a small reserve character buffer at the end that I adjust in size as I add or adjust components internally.  For example I may have 1024 bytes reserved at the end.  I can then add a 4 byte integer at the end of used space and subtract 4 bytes from the reserve buffer and keep the IO "the same".  However, I find that sometimes things go wrong and I find zero fields where I don't expect it (even without an alignment error/warning by the compiler).  Is that a padding byte or is it more likely my own coding error?  In reality, I should design a better data stream, but dumping structures is rife in windows programming it seems and it is easy (lazy).

0 Kudos
14 Replies
Steve_Lionel
Honored Contributor III
862 Views

Padding is never added during I/O. However, derived types by default have padding added to naturally align their components. If you can provide a short example demonstrating the problem we can examine it further.

0 Kudos
garylscott1
Beginner
862 Views

Yes, I understand that padding is added in memory.  The question was is that padding that I know is added in memory ever included on a write operation to disk (I could take your response as implying that I thought it was added only on the write operation).  I think you meant it is never included in a write to disk even if present in memory.  If that is the case, then I obviously have a coding error.

0 Kudos
Steve_Lionel
Honored Contributor III
862 Views

Correct - Unformatted I/O operations move a derived type variable as if it was an anonymous block of bytes.

0 Kudos
FortranFan
Honored Contributor II
862 Views

There is a parallel thread on this at comp.lang.fortran.  The 'what' and 'how' of the issues faced by OP as they relate to Intel Fortran compiler-specific details of structure alignment and padding, component reordering (if applicable), and unformatted IO are still unclear.  It makes it even more difficult to understand what's going in without an actual example.

So I decided to run a little test and I didn't notice any issues notwithstanding the caution and limitations that accompany this in terms of processor-dependent aspects and portability across platforms and compilers.  Here's the code if anyone wants to review and comment, particularly Steve or from the Intel Fortran team:

module t_m 

   use, intrinsic :: iso_c_binding, only : c_loc, c_ptr 

   implicit none 

   private 

   type, public :: t 
      integer           :: i = 0 
      character(len=13) :: s = "" 
      real              :: r = 0 
   end type t 

   public :: read_t 
   public :: write_t 
   public :: get_offsets 

contains 

   subroutine read_t( this, filename, istat, imsg ) 

      ! argument list 
      type(t), intent(inout)          :: this 
      character(len=*), intent(in)    :: filename 
      integer, intent(inout)          :: istat 
      character(len=*), intent(inout) :: imsg 

      ! local variables 
      integer :: lun 

      open( newunit=lun, file=filename, access="stream", form="unformatted", action="read",         & 
            status="old", iostat=istat, iomsg=imsg ) 
      if ( istat == 0 ) then 
         read( unit=lun, iostat=istat, iomsg=imsg ) this 
         close( unit= lun ) 
      end if 

      return 

   end subroutine read_t 

   subroutine write_t( this, filename, istat, imsg ) 

      ! argument list 
      type(t), intent(in)             :: this 
      character(len=*), intent(in)    :: filename 
      integer, intent(inout)          :: istat 
      character(len=*), intent(inout) :: imsg 

      ! local variables 
      integer :: lun 

      open( newunit=lun, file=filename, access="stream", form="unformatted", action="write",        & 
            status="replace", iostat=istat, iomsg=imsg ) 
      if ( istat == 0 ) then 
         write( unit=lun, iostat=istat, iomsg=imsg ) this 
         close( unit= lun ) 
      end if 

      return 

   end subroutine write_t 

   subroutine get_offsets( this, offsets ) 

      ! argument list 
      type(t), intent(in), target :: this 
      integer, intent(inout)      :: offsets(:) 

      ! local variables 
      type(c_ptr) :: address 
      integer :: base_address 

      address = c_loc( this%i ) 
      base_address = transfer( source=address, mold=base_address ) 
      offsets(1) = 0 

      address = c_loc( this%s ) 
      offsets(2) = transfer( source=address, mold=base_address ) - base_address 

      address = c_loc( this%r ) 
      offsets(3) = transfer( source=address, mold=base_address ) - base_address 

      return 

   end subroutine get_offsets 

end module t_m 
program p 

   use, intrinsic :: iso_fortran_env, only : character_storage_size 
   use t_m, only : t, read_t, write_t, get_offsets 

   type(t) :: foo 
   type(t) :: bar 
   integer :: offsets(3) 
   character(len=*), parameter :: datfile="C:\temp\foo.dat" 
   integer :: istat 
   character(len=256) :: imsg 
   integer :: sizefile 

   !.. 
   foo = t( i=42, s="Hello World!", r=99.0 ) 
   call write_t( this=foo, filename=datfile, istat=istat, imsg=imsg ) 
   if ( istat /= 0 ) then 
      print *, "write failed: istat = ", istat, ", imsg = ", imsg 
      stop 
   end if 
   call get_offsets( this=foo, offsets=offsets ) 
   print *, "components of foo are at ", offsets 

   ! 
   call read_t( this=bar, filename=datfile, istat=istat, imsg=imsg ) 
   if ( istat /= 0 ) then 
      print *, "read failed: istat = ", istat, ", imsg = ", imsg 
      stop 
   end if 
   call get_offsets( this=bar, offsets=offsets ) 
   print *, "components of bar are at ", offsets 
   print *, "bar%i = ", bar%i, "; expected value is ", foo%i 
   print *, "bar%s = ", bar%s, "; expected value is ", foo%s 
   print *, "bar%r = ", bar%r, "; expected value is ", foo%r 

   inquire( file=datfile, size=sizefile, iostat=istat ) 
   if ( istat == 0 ) then 
      print *, "file size for IO: ", sizefile, " file storage units." 
      print *, "size of foo: ", storage_size(foo)/character_storage_size, " bytes." 
      print *, "size of bar: ", storage_size(bar)/character_storage_size, " bytes." 
   end if 

   stop 

end program p 

With Intel Fortran compiler options of /standard-semantics /warn:all /check:all /stand, the above code compiles without any warnings or errors.  Upon execution, the output is: 
 

 components of foo are at  0 4 20 
 components of bar are at  0 4 20 
 bar%i =  42 ; expected value is  42 
 bar%s = Hello World!             ; expected value is Hello World! 
 bar%r =  99.00000 ; expected value is  99.00000 
 file size for IO:  24  file storage units. 
 size of foo:  24  bytes. 
 size of bar:  24  bytes. 

Note by default, Intel Fortran aligns derived type components at 8 byte boundaries and thus the memory location offsets are 0, 4, and 20. 

Now consider the following: 

  1. say the second component of CHARACTER type is declared to have a length of 11 instead of 13.  The offsets should then be 0, 4, and 16 and that's what the program output shows. 
  2. say the compiler option of /align:rec4byte is introduced and the length of CHARACTER component is set to 4.  You would now expect the offsets to be 0, 4, and 8 which is indeed the case.
  3. but say the compiler option of /align:rec16byte is used while the length of CHARACTER component is upped to 25.  As expected, the offsets then become 0, 4, and 32. 
  4. now say the compiler is asked to not do any padding by specifying /align:rec1byte, then offsets with the above code should be 0, 4, and 17 and you can expect the compiler to raise a warning about data misalignment.  This is what one gets. 

But in all the above cases, the unformatted read action on the file is ok and the object ("bar") is deserialized as expected.

0 Kudos
Steve_Lionel
Honored Contributor III
862 Views

Gary suggested that padding was being added at the point of the I/O, which is absolutely not happening. As I said earlier, components do get padding for natural alignment by default in Intel Fortran. You can disable this by 1) Making the type a BIND(C) type, 2) Specifying /noalign when compiling, 3) bracket the type declaration with !DIR$ OPTIONS /NOALIGN ... !DIR$ END OPTIONS, 4) Make the type a SEQUENCE type (not recommended)

0 Kudos
garylscott1
Beginner
862 Views

No, I never suggested that it was added during (at the time of) the IO process.  I asked whether padding that is already present in the structure in memory is included in an IO operation when a whole DT reference is made.  Some other compilers docs indicates that it is, some that it is not.  If not for IVF, then the answer was more likely that the components were reordered (non-sequence type).  My inspection of the binary file content (and the program that normally reads it failed) was that two zero bytes appeared between two initialized character variables.  The file must be compatible across compiler versions (e.g. created with 11.1) and regardless of compile time options that impact order (and/or padding), therefore I've redesigned all IO to explicitly specify the component list in the preferred order.  This will eliminate these incompatibilities once and for all.  Yes, I should have known better, but the pressure of producing something quickly led to a poor design.

0 Kudos
mecej4
Honored Contributor III
862 Views

Gary, many of your comments strike me as being ambiguous and confusing. Some sentences defy my attempts to parse them (e.g. "I could take your response as implying that I thought it was added only on the write operation" in #3). On the other hand, Steve's statements are quite clear and informative.

This is just my reaction to what has been said in this thread and the companion CLF thread. I do not seek to start an argument. If you do not agree with my observations, please ignore them.

0 Kudos
Steve_Lionel
Honored Contributor III
862 Views

As I said, an unformatted WRITE of a derived type value is treated as a single entity, not as a series of components. If that entity has padding in its storage representation, those pad bytes are included. A formatted WRITE is different, but you asked about "binary/stream I/O" by which I understand you to be asking about unformatted stream I/O.

0 Kudos
FortranFan
Honored Contributor II
862 Views

How applicable is the code provided in Message #5 to whatever one can reasonably infer about the issues as commented by OP?

OP has also brought up component reordering.  Per my understanding and as shown with the code in Message #5 and also based on Steve's comments in Message #4 (anonymous block of bytes) and #9 ( single entity), component reordering is also a non-issue with writing and reading a derived type using unformatted and say stream IO.  Is this accurate?

0 Kudos
Steve_Lionel
Honored Contributor III
862 Views

Any component reordering (which Intel Fortran doesn't do but the standard allows for non-SEQUENCE and non-interoperable types) happens when the compiler lays out the derived type in memory. The compiler never inserts padding or reorders components during execution.

Since OP has yet to provide a test case, we can speculate until the cows come home about what the problem might really be. I suspect in the end it will prove to be a coding error.

0 Kudos
garylscott1
Beginner
862 Views

There is no need for a code sample.  I merely asked for clarification as to how IVF actually behaves.  I made no accusation that IVF was doing anything wrong.  I know I have a coding error, that was the point of the question, to clarify the best approach for fixing the coding error.  First you must understand the cause...I was also not trying to be argumentative, for heaven's sake.

Your very first statement "Padding is never added during I/O." was not what I was expecting and confused me.  I never even considered that padding might be added only during the IO operation if it wasn't present in memory.  My going in theory was that it was added if it was included in memory (for steam io) and I just wanted confirmation.

The following statement is what I expected to see:

"an unformatted WRITE of a derived type value is treated as a single entity, not as a series of components. If that entity has padding in its storage representation, those pad bytes are included. A formatted WRITE is different, but you asked about "binary/stream I/O" by which I understand you to be asking about unformatted stream I/O."

So, now I fully understand.  Thank you all.

 

 

0 Kudos
JVanB
Valued Contributor II
862 Views

OK, but now there's a little ambiguity from my point of view: in memory, at least if the UDT have the BIND property, it will get aligned according to the requirements of its most restrictive component. During unformatted I/O, if the previous operation left the file at some weird alignment, as by outputting a character variable with an odd LEN, do padding bytes get inserted before the structure to align it or is the whole structure written to the file misaligned?

 

0 Kudos
Steve_Lionel
Honored Contributor III
862 Views

Repeat Offender wrote:
if the previous operation left the file at some weird alignment, as by outputting a character variable with an odd LEN, do padding bytes get inserted before the structure to align it or is the whole structure written to the file misaligned?

Padding is never inserted during I/O. FIles don't care about alignment.

0 Kudos
mecej4
Honored Contributor III
862 Views

R.O., You don't think that Fortran goes that deep into the storage architecture, do you? -- deeper than the level of the user's view of the file system, which is as a collection of bytes?

You would have to look into the depths of the filesystem and HDD/SSD controllers to see alignment and padding issues. For example, the "wear-leveling" strategy used on SSDs.

0 Kudos
Reply