Is module member alignment still matter?

a_zhaogtisoft_com · ‎02-03-2014

I have been reading an old article:

http://msdn.microsoft.com/en-us/library/Aa290049

It talks about memory alignment issue using C/C++ as example.

Is the issue in that article still relevant to F90 module definition? I would think the same principle still applies.

We have a house of f90 developers who just keep appending load of members to existing module definitions, and data alignment has never been in their consideration. We do use compilation option such as "/real_size:64 /align:rec16byte /align:qcommons /align:sequence" (we do not set anything for array alignment though)

Would there be any potential performance enhancement if we start to align module members to 16bytes boundary, instead of relying on compiler to handle it?

Any good reference?

Steven_L_Intel1 · ‎02-03-2014

The article is talking about struct components, which would be equivalent to derived type components in Fortran. Modules don't really have a connection here. If the components are arrays, then aligning on 16-byte boundaries could help vectorization, but if they are single components, then just align to the type's "natural" boundary. This is done by default with derived types that don't have SEQUENCE.

TimP · ‎02-03-2014

-align array32byte

may improve performance on CPUs of the last 5 years.

a_zhaogtisoft_com · ‎02-03-2014

We have all kind of member types in a typical module:

generic types (integer, real, some fixed length character string, and tons of allocatable arrays as well as derived types (which becomes more and more common in house now).

I guess I was wrong to map the F90 module to C struct. For a typical f90 module:

      module foo
       integer :: aCount=500

        integer :: a1
        integer :: a2
        integer, allocatable, dimension(:) :: b1
        integer, allocatable, dimension(:) :: b2

      type STRUCT_CONN
        integer, pointer, dimension(:)   :: jelei_conn => NULL()
        integer, pointer, dimension(:,:) :: nnod_conn => NULL()
        real,    pointer, dimension(:)   :: xmult_conn => NULL()
      end type STRUCT_CONN
      type (STRUCT_CONN), allocatable, dimension(:) :: c1

      end module foo

I normally see that foo_mp_a1, foo_mp_a2, foo_mp_b1, foo_mp_b2 are used to represent the member a1, a2, b1, b2. So F90 module sounds more like a namespace in C++ then?

Steven_L_Intel1 · ‎02-03-2014

Yes, you could sort-of say that modules are like a namespace. In the example you post, alignment of the type components doesn't matter as they're all pointers. The alignment of the data allocated for the pointers will matter. Let me suggest to you, though, that you make these ALLOCATABLE instead of POINTER - you'll get better performance (and don't need to initialize to NULL().

jimdempseyatthecove · ‎02-03-2014

Steve,

This is an obtuse but related question.

The user (I assume) stated that the pointers were used to point to a "member" (slice?) of the allocatable arrays. (assume he omitted to list the TARGET attribute). Now then:

Assume you have an allocatable array (contiguous), and then you ASSOCIATE to a contiguous slice of this array to an associate-name.

Is the associate-name then known by the compiler as being contiguous for optimization purposes?

Jim Dempsey

Steven_L_Intel1 · ‎02-03-2014

jimdempseyatthecove wrote:

Assume you have an allocatable array (contiguous), and then you ASSOCIATE to a contiguous slice of this array to an associate-name.

Is the associate-name then known by the compiler as being contiguous for optimization purposes?

My experiments suggest that the answer is "yes". But "your mileage may vary". I found a difference between:

b = a

and

b = a(1:900)

where b was allocated with 900 elements (and so was a). The first assignment (in the ASSOCIATE construct) was done with a call to intel_fast_memcpy, while the second was not.

a_zhaogtisoft_com · ‎02-03-2014

I just try out the "-align array32byte" globally for all f90 projects, I did not see any performance enhancement due to compiler option change.

What is it? When should it be used? Any idea?

Steven_L_Intel1 · ‎02-03-2014

The compiler automatically aligns variables on a 32 or even 64-byte boundary depending on your choice of /Qx options. Note that this is alignment of individual variables, NOT of components inside a derived type. If you're already using recommended optimization options (/fast is a good shortcut), then you should run the program under VTune Amplifier XE and see where it is spending its time so you can focus your efforts. Randomly throwing switches and hoping for improvements is not likely to be productive.