Solved: Data Alignment optimization in passing arguments

commbank · ‎12-18-2008

Hi,

I've read in various places (inc Intel Docs in the optimization section) that we can help Data Alignment by making sure that the data are declared so that they are in their natural data boundaries. Example is to have the larger data types first followed by smaller data types: eg Real*8, Int*4, character, etc...

However most documentation suggest ( I may have misread) the data alignment is used for Derived Data types, Common Blocks, Record Structures.

My question is when passing a huge list of arguments eg, subroutine foo(a, b, c, d, ....) where the arguments can be real*8, int*4 arrays and also scalars, and none of the arguments are Derived Data Types, Common Block etc...; Can I still align the data? i.e. is there such a thing as data alignment for arguments passing? If so, is it the same principle as the Common block where large arguments are placed ahead in the list than others?

Thanks

Steven_L_Intel1 · ‎12-18-2008

Quoting - commbank

Hi,

I've read in various places (inc Intel Docs in the optimization section) that we can help Data Alignment by making sure that the data are declared so that they are in their natural data boundaries. Example is to have the larger data types first followed by smaller data types: eg Real*8, Int*4, character, etc...

However most documentation suggest ( I may have misread) the data alignment is used for Derived Data types, Common Blocks, Record Structures.

My question is when passing a huge list of arguments eg, subroutine foo(a, b, c, d, ....) where the arguments can be real*8, int*4 arrays and also scalars, and none of the arguments are Derived Data Types, Common Block etc...; Can I still align the data? i.e. is there such a thing as data alignment for arguments passing? If so, is it the same principle as the Common block where large arguments are placed ahead in the list than others?

Thanks

Data alignment is an attribute of the variable when it is declared, not specific to passing arguments. But as far as passing arguments go, each argument is individual - you don't have to worry about combinations of data types in arguments.

In most cases, your variables are already aligned. The ways that you might force misalignment are in derived types, COMMON and EQUIVALENCE.

View solution in original post

Steven_L_Intel1 · ‎12-18-2008

Quoting - commbank

Hi,

I've read in various places (inc Intel Docs in the optimization section) that we can help Data Alignment by making sure that the data are declared so that they are in their natural data boundaries. Example is to have the larger data types first followed by smaller data types: eg Real*8, Int*4, character, etc...

However most documentation suggest ( I may have misread) the data alignment is used for Derived Data types, Common Blocks, Record Structures.

My question is when passing a huge list of arguments eg, subroutine foo(a, b, c, d, ....) where the arguments can be real*8, int*4 arrays and also scalars, and none of the arguments are Derived Data Types, Common Block etc...; Can I still align the data? i.e. is there such a thing as data alignment for arguments passing? If so, is it the same principle as the Common block where large arguments are placed ahead in the list than others?

Thanks

Data alignment is an attribute of the variable when it is declared, not specific to passing arguments. But as far as passing arguments go, each argument is individual - you don't have to worry about combinations of data types in arguments.

In most cases, your variables are already aligned. The ways that you might force misalignment are in derived types, COMMON and EQUIVALENCE.

TimP · ‎12-18-2008

Quoting - Steve Lionel (Intel)

Data alignment is an attribute of the variable when it is declared, not specific to passing arguments. But as far as passing arguments go, each argument is individual - you don't have to worry about combinations of data types in arguments.

In most cases, your variables are already aligned. The ways that you might force misalignment are in derived types, COMMON and EQUIVALENCE.

As far as I know, ifort (since 8.1) adheres to a consensus standard of making arrays of 16 bytes or more 16-byte aligned, so as to facilitate vectorization. Both of you noted the cases where there is an implied alignment which would take precedence.

The ability of COMMON to take advantage of architectural preferences in alignment (by ordering data in decreasing order of size, or with appropriate array sizes) is a counter to the concept of COMMON being obsolete. Now that architectures with preference for 32- or 64-byte alignment are under discussion, the question arises whether we could extend this advantage yet again.

jimdempseyatthecove · ‎12-18-2008

Quoting - tim18

As far as I know, ifort (since 8.1) adheres to a consensus standard of making arrays of 16 bytes or more 16-byte aligned, so as to facilitate vectorization. Both of you noted the cases where there is an implied alignment which would take precedence.

The ability of COMMON to take advantage of architectural preferences in alignment (by ordering data in decreasing order of size, or with appropriate array sizes) is a counter to the concept of COMMON being obsolete. Now that architectures with preference for 32- or 64-byte alignment are under discussion, the question arises whether we could extend this advantage yet again.

My observation relates not only to this post but to an earlier one where the static (common) variable are limited to 2GB. Now let me preface this with a caution about stepping on toes...

The Fortan specification has the ability to name a common block. Windows and I don't know about Linux/Unix/other have a linker restriction limiting the common to 2GB (or 3 or 4GB). Why is it that (to my limited knowledge) it hasn't come into consideraton to permit the AUTOMATIC attribute to be tacked on to a named common whereby the named common essentialy becomes a pointer to the data block. Or optionaly an option switch to cause all named commons to assume the AUTOMATIC attribute. With the switch there would be no changes to the source code specification. If the AUTOMATIC were implemented with a !DEC$ AUTOMATIC or whatever name you choose you would also have no change to the Fortran specification yet have the feature of declaring virtually unlimited COMMON storage.

Jim Dempsey

Steven_L_Intel1 · ‎12-19-2008

Jim, you don't really mean AUTOMATIC do you? That implies that the data gets temporarily allocated on entry to a routine and that the values are undefined on exit. Perhaps you mean ALLOCATABLE?

Intel Fortran actually has a feature pretty much like you describe, called "dynamic common", invoked with the /Qdyncom switch. I do not recommend its use and suggest ALLOCATABLE arrays in a module instead. Dynamic common dates from much older Intel Fortran versions before Fortran 90 and it has some limitations.

jimdempseyatthecove · ‎12-19-2008

Yes, I did mean automatic. Recall that the data is in COMMON (and lives for the life of the process). The array descriptor or pointer to the block common struct is already residing in the static memory (not on stack) but at compile time is not allocated. The initialization code would perform the allocations in the same manner as done in C++ for static objects construction (i.e. you would reuse that functionality from C++). I would think this would be QED. This could (should) be extended to the module declared variables as well. The annoying problem with using allocatables in the modules is you know the sizes at compile time, you have to also add the allocation and deallocation elsewhere in the code. Making these AUTOMATIC (automaticly allocated at startup, and deallocated at exit) would simplify the programming. Instead of editing three areas of code, you only edit one.

Jim Dempsey

Steven_L_Intel1 · ‎12-19-2008

You want a way to do a one-time run-time allocation of the data that is referenced as a COMMON block, and that data persists for the life of the program. That's what the dynamic common feature does. The dynamic common feature even allows you to supply your own allocate routine.

AUTOMATIC means storage that disappears when you leave the scope in which it is declared. I suppose, if you're thinking of the whole program as that scope, AUTOMATIC might make some sort of sense in that context, but it is quite at odds with the notion of COMMON which is global to the program.

jimdempseyatthecove · ‎12-19-2008

>>but it is quite at odds with the notion of COMMON which is global to the program.

Ah, but the question for the Fortran programmer is: What constitutes when the program is all loaded and ready to run. If the allocations occure _before_ the first statement of PROGRAM foo then I would say these allocatable items are global to the program at program startup. Now this may be quite a different thing with respect to FORTRAN 20xx with proper constructors (distructors too I hope). But by then, FORTRAN 20xx or MS will have fixed the damb 32-bit load segment size problem (which is the root cause of the problem).

IMHO a "hack" to fix this (if you cannot avoid the 32-bit loadsegment limit)is to permit the Linker code to have elastic items in the segment. And for the program loader to be aware of this.As and example, an array that is not initialized would occupy a few bytes of the load code. Its load point is at the current position, then expands to occupy the full space (I think there stuff in the loader to do this already). Symbols loading after the array are specified using offsets relative to the end of the just loaded array and not relative to the beginning of the load segment. These offsets expand from 32-bits to 64 bits. The loader already makes a 2nd pass performing fixups for the segment relative addresses, with a slight modification it could perform the fixups relative to last item loaded.

For data initialization, it can be performed as a back patch (as opposed to load directly). Then any one piece of initialization data could not exceed 2/4GB but the sum total could. (e.g. a .gt. 4GB data set could be back patched using multiple patches).

The advantage of the technique is the older .EXEs would load fine with the newer program loader. The disadvantage is that the shrunken elastic load segment would still be limited to 2/4GB. Although a good trade-off IMHO.

Jim Dempsey