Exporting derived-type structures to binary file

OP1 · ‎07-28-2009

Hi!

Is there a convenient way to export/import a derived-type structure to a binary file?

For instance, let's assume that I have declared the following type

TYPE T_MY_TYPE
INTEGER,ALLOCATABLE :: A(:,:)
REAL :: SOME_VARIABLE
INTEGER,ALLOCATABLE :: ANOTHER_BIG_ARRAY(:,:,:)
END TYPE T_MY_TYPE

and that later on in my program I declared and initialized a variable MY_VAR of type T_MY_TYPE.

Is there a nice, convenient way to export MY_VAR to a binary file, and then load it back later on? Or should I go over all the components of MY_VAR, find whether they are allocated or not, write them to file along with a description of their size and allocation status, so that I can read MY_VAR later on.

It would be great if I could just do something like: WRITE(MY_UNIT) MY_VAR... The problem here is of course the presence of allocatable arrays (which may or may not be allocated) in the derived type structure. Otherwise it would just be a simple record of fixed length.

Thanks in advance,

Olivier

IanH · ‎07-28-2009

No, sort of, well at least not yet.

At some stage in the future you might be able to use that simple write statement, but the compiler you use will need to support the F2003 feature of user-defined defined-type io. But you'll still have to write the procedure that runs through the components of your type and checks their allocation status before trying to write them, etc, its just that the compiler will automatically call that procedure when it sees WRITE(MY_UNIT) MY_VAR.

IanH

OP1 · ‎07-28-2009

Thanks for your answer,

Yes, that would be a really neat feature - being able to export/import a structure at once. Right now I havea set of custom subroutines to do that: I need to keep track of allocation status and size and position in the file of all the type component variables, which can be quite tricky if your structure is complex.I have allocatable derived-type variables where some components are also allocatable variables of another derived type, so it gets messyvery quickly. Nothing outstandingly complex programming-wise; just tedious book-keeping.
Having this as a feature of the language set would be tremendous. Maybe it's not too difficult to implement. I dunno... Can we lobby Steve to push for this :) ?? For Fortran 2013 :) ...

Olivier

Steven_L_Intel1 · ‎07-28-2009

You can't get me to push for any new language features in Fortran 2013. However, Ian's comment that user-defined derived type I/O could be useful here is on target. Unfortunately, you're not going to see that from us for a while.

I had wondered if perhaps NAMELIST I/O would help, but my tests encountered some compiler bugs (escalated), so no help there. I'm not sure if it would even be allowed to do a NAMELIST write if one of the values is unallocated (I haven't looked that up.)

A custom routine seems like the best bet. You might want to consider XML or an XML-like structure.

OP1 · ‎07-28-2009

Steve,

Yes, I understand. But just out of curiosity, how areideas which arefloating around pushed up among the new features of FORTRAN? All these great features of FORTRAN 77,90, 95, 2003 and so on have to come up from somewhere :) . I am sure there must be a formal process for this. If you could shed light on this it would be interesting - at least on a historical perspective -to see the process by which a language evolves over time and incorporate new ideas and features. I am curious and intrigued.

Olivier

Steven_L_Intel1 · ‎07-28-2009

Sorry if I wasn't clear. In the normal course of things, you can certainly ask for a new feature. Typically, new features are proposed by users through a standards representative. However, there is a strong urge among most committee members to make Fortran 2013 a "Corrections and Clarifications" update with no new features, and I am firmly in support of that. Look at what we have now - it is 2009 with Fortran 2008 almost published (sometime next year), and there are only two compilers claiming to have all of F2003, and neither of those are on mainstream platforms. Even NAG, which is usually first out of the gate, has a way to go, as do we (though we're getting closer.)

I think it would be best for the community to let us all catch our breaths and bring full F2003 compilers to market. Some vendors, us included, are even proposing to add some F2008 features before finishing all of F2003, because they are in high demand (coarrays, for example).

In the particular case of your request, binary representation of files, or of allocatable array descriptors, is something the Fortran language does not get involved with. There is a proposal to extend interoperability to provide "access routines" that would manipulate and query descriptors, and I like that idea, but what you do with the information would be up to you.

The standard does say, for NAMELIST, that every namelist group item that is an allocatable "shall be allocated", so that's no help. User defined derived type I/O is the best approach the standard could take to such a request.

Paul_Curtis · ‎07-28-2009

My codes do file i/o on UDTs all the time, although not with allocatable components. As has been noted in this thread, this question points up the limitations of Fortran file functions. However, Win32 API functions bring total control and flexibility and make this task completely straightforward. Consider the following utility (wrapper) function:

[cpp]SUBROUTINE rw_file (rwmode, ihandl, nbytes, loc_pointer, offset)
	IMPLICIT NONE
	CHARACTER(LEN=1), INTENT(IN)	:: rwmode
	INTEGER, INTENT(IN)				:: ihandl, nbytes, loc_pointer
	INTEGER, INTENT(IN), OPTIONAL	:: offset
	INTEGER							:: nact

	! position pointer if offset is provided
	IF (PRESENT(offset)) nact = SetFilePointer (ihandl, offset, NULL, FILE_BEGIN)

	IF (rwmode == 'R') THEN
		IF (.NOT.ReadFile (ihandl,			&  ! file handle
				     	  loc_pointer,		&  ! address of data
						  nbytes,			&  ! byte count to read
						  LOC(nact),		&  ! actual bytes read
						  NULL_OVERLAPPED))	THEN
		  	banner(1) = 'Error reading file'
			CALL API_error (8001, 1)
		END IF
	
	ELSE
		IF (.NOT.WriteFile(ihandl,			&  ! file handle
						  loc_pointer,		&  ! address of data
						  nbytes,			&  ! byte count to write
						  LOC(nact),		&  ! actual bytes written
						  NULL_OVERLAPPED))	THEN
		  	banner(1) = 'Error writing file'
			CALL API_error (8002, 1)
		END IF
	END IF
[/cpp]

Thus you can transfer any number of bytes, as a contiguous block unencumbered by even the notion of "format" or "data type", between a file and any memory location; simply,

CALL rw_file ('R', ihandl, SIZEOF(my_type), LOC(my_type))

and you're done. I don't know if SIZEOF() works correctly on a structure with allocated components, never tried that, but if you've done the allocating presumably you could also keep track of the actual total size -- in fact, you should add to your UDT some integer components for the total size and the sub-ranges of each of the allocated arrays, to avoid future housekeeping issues.

IanH · ‎07-28-2009

I don't think that "raw memory" approach is going to work (or I would be stunned if it did) for a variable of user defined type with an allocatable component.

The actual data associated with a allocatable component (that has been allocated) will probably not be in the same linear sequence of bytes as occupied by the rest of the components of the variable. You'll just write the descriptor (or whatever) that describes the allocatable component to the file, but not the data. Parts of that descriptor (there's normally a memory pointer or two in there) are very unlikely to be meaningful from a write to a read. This is regardless of whether the sizeof call returns the appropriate total size.

The programming logic, file format layout decisions, etc necessary to use such a raw memory approach on user defined types with allocatable components will be very similar to what is required using standard fortran calls anyway.

IanH

Paul_Curtis · ‎07-28-2009

Quoting - IanH

The actual data associated with a allocatable component (that has been allocated) will probably not be in the same linear sequence of bytes...

So add SEQUENCE to the derived type. The definition of SEQUENCE is totally clear, and omits any qualifiers that allocatable components of UTDs are not also physically stored in contiguous memory.

Indeed, if I were writing compiler code to implement UDTs with allocatable components, the act of allocation of those components would automatically force a complete restructuring of the memory used for the entire UDT, not just one particular component, precisely so as to preserve SEQUENCE and other attributes. I think implementing some sort of linked-list memory management approach would be difficult and highly error-prone. Since it is clearly possible to SEQUENCE regular (non-allocatable) UTDs, any other approach would imply the existance of a parallel universe with two entirely different storage models, highly unlikely.

Anyone out there know how this really works? We're just guessing.

Steven_L_Intel1 · ‎07-29-2009

There is no automatic reordering or SEQUENCE. There's also no straightforward way of writing out the descriptors AND data for allocatable and pointer components. You are not allowed to do I/O of a derived type containing allocatable components. The solution has to be in terms of user code that understands the type and "does the right thing", whatever that may be.

OP1 · ‎07-29-2009

Ah ah, well I guess we can't be too greedy :) . For the type of numerical analysis I deal with, having the ability to have allocatable arrays (of any type) as components of a derived-type variable is the best thing since sliced bread. So I am pretty happy with that already! A nicely structured derived-type variable can contain all of the input data I need to describe a problem (and that's a lot of information).

Now, the thought that the process of writing/reading such a variable could be simplified/automated (no matter the format) ... well... that would be the frosting on the cake for sure. As mentioned above, a user subroutine is required, but for complex structures (with allocatable components, some may or may not be allocated, others may contain more allocatable derived-type variables, etc) its a LOT of bookkeeping (I need to keep track of allocation status, size of each component etc.). And of courseanytime I decide to adda component to any of my derived-types I need to revamp my IO subroutines. Argh.

Thanks to all for the valuable input and interesting thread comments.

Olivier

Paul_Curtis · ‎07-29-2009

Quoting - Steve Lionel (Intel)

There is no automatic reordering or SEQUENCE. There's also no straightforward way of writing out the descriptors AND data for allocatable and pointer components. You are not allowed to do I/O of a derived type containing allocatable components. The solution has to be in terms of user code that understands the type and "does the right thing", whatever that may be.

Well that's definitive and makes sense. Perhaps the help info on SEQUENCE and on Derived Types should be updated to indicate the exceptional behavior of UDTs with allocatable components; you could just add the above para.

Anonymous66 · ‎04-25-2013

A fix has been found for this issue. We are planning to include it in the next major release which is currently scheduled for later this year.