Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28455 Discussions

Issue with the alignment of the components of derived-types

Misael_D_
Beginner
1,028 Views

Greetings,

I am developing a program to study the dynamics of a system of particles in a cubic box with periodic boundary conditions. I would like to take advantage of auto-vectorization to expedite computations. To that end, I have created a data structure for storing the position, orientation, force, and torque vectors (among quantities of interest). 

If I understood correctly the documentation on auto-vectorization, I can request the compiler to align the data structure (or derived-type) to a 32-bit boundary and use the sequence attribute to pack the components of the structure (no-padding). I am interested in aligning data to 32-bit boundaries since I am targeting the program to run in AVX processors. The problem that I am facing is that none of the components becomes aligned despite the fact that the data structure is requested to be aligned to a 32-bit boundary.

To ease the troubleshooting, I am posting a minimalistic version of the code here (with comments for clarity of the reader):

module dynamics
    ! Description:
    ! Defines the principal data structure that stores information about the particles
    ! in the system. To take advantage of auto-vectorization the data is organized
    ! as a Structure of Arrays (SoA).
    use, intrinsic :: iso_fortran_env
    implicit none
    type data
        sequence
        ! position vector
        real(kind = real64), allocatable :: r_x(:);
        real(kind = real64), allocatable :: r_y(:);
        real(kind = real64), allocatable :: r_z(:);
        ! orientation vector (director)
        real(kind = real64), allocatable :: d_x(:);
        real(kind = real64), allocatable :: d_y(:);
        real(kind = real64), allocatable :: d_z(:);
        ! force vector
        real(kind = real64), allocatable :: F_x(:);
        real(kind = real64), allocatable :: F_y(:);
        real(kind = real64), allocatable :: F_z(:);
        ! torque vector
        real(kind = real64), allocatable :: T_x(:);
        real(kind = real64), allocatable :: T_y(:);
        real(kind = real64), allocatable :: T_z(:);
        ! displacement vector (the prefix `d' stands for delta or difference)
        real(kind = real64), allocatable :: dr_x(:);
        real(kind = real64), allocatable :: dr_y(:);
        real(kind = real64), allocatable :: dr_z(:);
        ! particle ID array
        integer(kind = int32), allocatable :: ID(:);
        ! padding array, such that we have an equivalent of 16 arrays of 64 bits each
        integer(kind = int32), allocatable :: padding(:);
    end type data

    private
    public data

end module dynamics

program alignment_test
    ! Minimalistic program to test the alignment of derived-types.
    use, intrinsic :: iso_fortran_env
    use dynamics
    implicit none

    ! particle data structure, aligned to 32-bits
    type(data), target :: pdata;
    !dir$ attributes align: 32 :: pdata

    ! number of particles in the system
    integer(kind = int32), parameter :: n_pdata = 64;

    ! captures the status returned by allocate/deallocate functions
    integer(kind = int32) :: alloc_stat;

    ! pointer to access the components of the data structure
    real(kind = real64), pointer, contiguous :: pr_x(:);
    real(kind = real64), pointer, contiguous :: pr_y(:);
    real(kind = real64), pointer, contiguous :: pr_z(:);

    allocate( pdata %r_x(n_pdata), pdata %r_y(n_pdata), pdata %r_z(n_pdata),&
        pdata %d_x(n_pdata), pdata %d_y(n_pdata), pdata %d_z(n_pdata),&
        pdata %F_x(n_pdata), pdata %F_y(n_pdata), pdata %F_z(n_pdata),&
        pdata %T_x(n_pdata), pdata %T_y(n_pdata), pdata %T_z(n_pdata),&
        pdata %dr_x(n_pdata), pdata %dr_y(n_pdata), pdata %dr_z(n_pdata),&
        pdata %ID(n_pdata), pdata %padding(n_pdata), stat=alloc_stat );

    if ( alloc_stat /= 0 ) then
        stop "insufficient memory to allocate the data structure, program stopped"
    end if

    pr_x => pdata %r_x;
    pr_y => pdata %r_y;
    pr_z => pdata %r_z;

    ! assign pretend values to the components of the position vector of the particles

    !dir$ assume_aligned pr_x: 32
    pr_x = 0.0d+0;
    !dir$ assume_aligned pr_y: 32
    pr_y = 0.0d+0;
    !dir$ assume_aligned pr_z: 32
    pr_z = 0.0d+0;


    ! free structure from memory
    deallocate( pdata %r_x, pdata %r_y, pdata %r_z,&
        pdata %d_x, pdata %d_y, pdata %d_z,&
        pdata %F_x, pdata %F_y, pdata %F_z,&
        pdata %T_x, pdata %T_y, pdata %T_z,&
        pdata %dr_x, pdata %dr_y, pdata %dr_z,&
        pdata %ID, pdata %padding, stat=alloc_stat );
    if ( alloc_stat /= 0 ) then
        stop "unexpected error, failed deallocate the data structure..."
    end if

end program

 

 

The program was compiled in the following manner:

ifort -g -traceback -check all -align nosequence  -O0 alignment_test.f90

Here is the output generated at runtime:

forrtl: severe (408): fort: (28): Check for ASSUME_ALIGNED fails for 'PR_X' in routine 'ALIGNMENT_TEST' at line 77.

Image              PC                Routine            Line        Source             
a.out              0000000000407786  Unknown               Unknown  Unknown
a.out              000000000040454F  MAIN__                     77  alignment_test.f90
a.out              0000000000402F1E  Unknown               Unknown  Unknown
libc.so.6          0000003B1F81ED5D  Unknown               Unknown  Unknown
a.out              0000000000402E29  Unknown               Unknown  Unknown

and the version of the Fortran compiler is the following:

ifort --version
ifort (IFORT) 16.0.3 20160415
Copyright (C) 1985-2016 Intel Corporation.  All rights reserved.


Thanks in advance for your help,
Misael

 

 

 

0 Kudos
9 Replies
Lorri_M_Intel
Employee
1,028 Views

All of the fields in your structure DATA are descriptors, not the actual allocated data.

When you declare "pdata" to be declared on a 32-byte boundary, again, it does not affect the actual allocated data, just where the DATA structure itself is aligned.

To do what you want, please use the -align:array32byte command line switch.

                          --Lorri

0 Kudos
Misael_D_
Beginner
1,028 Views

Thanks Lorri for your reply, I have recompiled the code (unchanged) with the following options:

ifort -g -traceback -check all  -align array32byte alignment_test.f90

however, the problem persists:

forrtl: severe (408): fort: (28): Check for ASSUME_ALIGNED fails for 'PR_X' in routine 'ALIGNMENT_TEST' at line 77.

Image              PC                Routine            Line        Source             
a.out              0000000000407786  Unknown               Unknown  Unknown
a.out              000000000040454F  MAIN__                     77  alignment_test.f90
a.out              0000000000402F1E  Unknown               Unknown  Unknown
libc.so.6          0000003B1F81ED5D  Unknown               Unknown  Unknown
a.out              0000000000402E29  Unknown               Unknown  Unknown

I still cannot figure out why I don't get the expected alignment.

 

0 Kudos
mecej4
Honored Contributor III
1,028 Views

Lorri's perceptive remark suggests to me that you may wish to reconsider the design of type(data).

In particular, the padding is not working as you intended. To see this, print c_sizeof(pdata) with and without the last item, "padding". You will observe that without the padding you have a structure containing 16 descriptors (and occupying 576 or 1152 bytes, depending on whether you are targeting IA32 or X64), whereas with the padding you have a structure containing 17 descriptors and occupying 612 or 1224 bytes.

Without the padding, the byte counts above, 576 or 1152, are divisible by 32, your preferred alignment. With the padding, the counts, 612 or 1224, are not divisible by 32.

See the following article for more details of the Fortran array descriptor: https://software.intel.com/en-us/node/525356 .

0 Kudos
IanH
Honored Contributor II
1,028 Views

The specification of the alignment for the allocatable array components could also go inside the type definition itself.

Here's a variation that uses parameterised derived types, where the componenent layout is perhaps consistent with what the OP originally had in mind, and which lets you get away with a much shorter allocate statement.  Yay for type parameters.

But running this on x64 windows the attributes align:32 specification for the pdata object doesn't appear to work - it appears to only be 16 byte aligned.  Is this as expected?

module dynamics
    ! Description:
    ! Defines the principal data structure that stores information about the particles
    ! in the system. To take advantage of auto-vectorization the data is organized
    ! as a Structure of Arrays (SoA).
    use, intrinsic :: iso_fortran_env
    implicit none
    
    type data(l)
        integer, len :: l
        ! position vector
        real(kind = real64) :: r_x(l)
        real(kind = real64) :: r_y(l)
        real(kind = real64) :: r_z(l)
        ! orientation vector (director)
        real(kind = real64) :: d_x(l)
        real(kind = real64) :: d_y(l)
        real(kind = real64) :: d_z(l)
        ! force vector
        real(kind = real64) :: F_x(l)
        real(kind = real64) :: F_y(l)
        real(kind = real64) :: F_z(l)
        ! torque vector
        real(kind = real64) :: T_x(l)
        real(kind = real64) :: T_y(l)
        real(kind = real64) :: T_z(l)
        ! displacement vector (the prefix `d' stands for delta or difference)
        real(kind = real64) :: dr_x(l)
        real(kind = real64) :: dr_y(l)
        real(kind = real64) :: dr_z(l)
        ! particle ID array
        integer(kind = int32) :: ID(l)
        ! padding array, such that we have an equivalent of 16 arrays of 64 bits each
        integer(kind = int32) :: padding(l)
    end type data

    private
    public data

end module dynamics

program alignment_test
    ! Minimalistic program to test the alignment of derived-types.
    use, intrinsic :: iso_fortran_env
    use dynamics
    implicit none

    ! particle data structure, aligned to 32-bits  (I think you mean bytes).
    type(data(:)), allocatable, target :: pdata
    !dir$ attributes align: 32 :: pdata

    ! number of particles in the system
    integer(kind = int32), parameter :: n_pdata = 64

    ! captures the status returned by allocate/deallocate functions
    integer(kind = int32) :: alloc_stat

    ! pointer to access the components of the data structure
    real(kind = real64), pointer, contiguous :: pr_x(:)
    real(kind = real64), pointer, contiguous :: pr_y(:)
    real(kind = real64), pointer, contiguous :: pr_z(:)

    allocate( data(n_pdata) :: pdata, stat=alloc_stat )

    if ( alloc_stat /= 0 ) then
        stop "insufficient memory to allocate the data structure, program stopped"
    end if

    pr_x => pdata %r_x
    pr_y => pdata %r_y
    pr_z => pdata %r_z

    print "(*(z16.16,1x,i0,:,/))",  &
        loc(pdata), mod(loc(pdata), 32),  &
        loc(pdata%r_x), mod(loc(pdata%r_x), 32),  &
        loc(pdata%r_y), mod(loc(pdata%r_y), 32),  &
        loc(pr_x), mod(loc(pr_x), 32)
    print "(*(i0,t10,i0,:,/))",  &
        loc(pdata%r_x) - loc(pdata), mod(loc(pdata%r_x) - loc(pdata), 32),  &
        loc(pdata%r_y) - loc(pdata), mod(loc(pdata%r_y) - loc(pdata), 32)
    
    ! assign pretend values to the components of the position vector of the particles

    !dir$ assume_aligned pr_x: 32
    pr_x = 0.0d+0
    !dir$ assume_aligned pr_y: 32
    pr_y = 0.0d+0
    !dir$ assume_aligned pr_z: 32
    pr_z = 0.0d+0


    ! free structure from memory
    deallocate( pdata, stat=alloc_stat )
    if ( alloc_stat /= 0 ) then
        stop "unexpected error, failed deallocate the data structure..."
    end if

    print *, 'done'
end program

For assumed_aligned, the docs say "If address ... has the POINTER attribute, it is the POINTER and not the pointee or the TARGET that is assumed aligned."  The runtime message that I get when the target is not 32 byte aligned isn't consistent with that sentence.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,028 Views

Use:

!DIR$ ATTRIBUTES ALIGN : alignment :: variableList

    type data
        sequence
        ! position vector
        real(kind = real64), allocatable :: r_x(:);
        real(kind = real64), allocatable :: r_y(:);
        real(kind = real64), allocatable :: r_z(:);
 !dir$ attributes align : 32 :: r_x, r_y, r_z

        ! orientation vector (director)
        real(kind = real64), allocatable :: d_x(:);
        real(kind = real64), allocatable :: d_y(:);
        real(kind = real64), allocatable :: d_z(:);
 !dir$ attributes align : 32 :: d_x, d_y, d_z

        ! force vector
        real(kind = real64), allocatable :: F_x(:);
        real(kind = real64), allocatable :: F_y(:);
        real(kind = real64), allocatable :: F_z(:);
 !dir$ attributes align : 32 :: F_x, F_y, F_z

        ! torque vector
        real(kind = real64), allocatable :: T_x(:);
        real(kind = real64), allocatable :: T_y(:);
        real(kind = real64), allocatable :: T_z(:);
 !dir$ attributes align : 32 :: T_x, T_y, T_z

        ! displacement vector (the prefix `d' stands for delta or difference)
        real(kind = real64), allocatable :: dr_x(:);
        real(kind = real64), allocatable :: dr_y(:);
        real(kind = real64), allocatable :: dr_z(:);
 !dir$ attributes align : 32 :: dr_x, dr_y, dr_z

        ! particle ID array
        integer(kind = int32), allocatable :: ID(:);
 !dir$ attributes align : 32 :: ID
    end type data

**** Note,  the alignment will work on the allocatables. However, if those arrays were of fixed size then the alignment is not (necessarily) attainable because the placement of the object of type data is unknown. You could attain alignment in this case (fixed arrays) by attributing each instance of variables of type data for desired alignment, and then using SEQUENCE and internal padding.

Note 2, please consider using alignment to cache line size (64 bytes), as later generation processors will be using 512-bit vectors.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,027 Views

BTW

If you directly use pdata%r_x the compiler will know r_x is aligned. However, if you pass pdata%r_x to a subroutine as a rank 1 array, the subroutine will not necessarily know the dummy argument is aligned (it will if it gets inlined). Using a pointer (pr_x=>pdata%r_x) is not necessarily going to know that pr_x points to aligned data (this may depend on if the compiler can see all possible ways of association).

Jim Dempsey

0 Kudos
Misael_D_
Beginner
1,027 Views

Thank you all for your instructive remarks.

I have removed the sequence attribute from the derived-type, I figured that the compiler can still align the arrays inside the structure even when the length of each array is not a factor of 32 by means of automatic padding, I suppose.  Have also added the compiler directives to the code for the sake of being more explicit, which might be useful for the next developer that will work on it. 

The current version of the code still contains the padding array to demonstrate that even though the size of pdata is 1224 bytes (not divisible by 32), all the arrays are still aligned to 32-bytes. Each array passed the runtime checks for assume aligned. If I interpret things correctly, c_sizeof(pdata) is only reporting the total byte count of the descriptors, not the actual arrays, and therefore, it cannot be used as a measure to determine proper alignment. Nevertheless, thanks mecej4 for bringing this to my attention since it allowed me to fully understand Lorri's remarks.    

I tried lanH suggestion, since it looks more elegant to deal with derived-types this way, however, the arrays did not passed the runtime checks for alignments. I think that this finding is in agreement with Jim's remarks on the alignment of fixed-size arrays (placement of the data structure being unknown). Thanks Jim for the suggestion to increase the alignment to 64 bytes. 

Jim, regarding your comment on passing the array pdata %r_x to a subroutine, would the !dir$ assume_aligned r_x: 32 would not help in this case (being r_x the corresponding argument of pdata %r_x in the subroutine)?

I tend to use pointers to access slices of the arrays because the compiler sometimes reports that it cannot vectorize it, on the other hand it works as expected with pointers (but I guess this is more suitable for a new thread).

I am posting the final version of the code here, since it might be of interest to someone else:

module dynamics
    ! Description:
    ! Defines the principal data structure that stores information about the particles
    ! in the system. To take advantage of auto-vectorization the data is organized
    ! as a Structure of Arrays (SoA).
    use, intrinsic :: iso_fortran_env
    implicit none
    type data
        ! position vector
        real(kind = real64), allocatable :: r_x(:);
        real(kind = real64), allocatable :: r_y(:);
        real(kind = real64), allocatable :: r_z(:);
        !dir$ attributes align: 32 :: r_x, r_y, r_z

        ! orientation vector (director)
        real(kind = real64), allocatable :: d_x(:);
        real(kind = real64), allocatable :: d_y(:);
        real(kind = real64), allocatable :: d_z(:);
        !dir$ attributes align: 32 :: d_x, d_y, d_z

        ! force vector
        real(kind = real64), allocatable :: F_x(:);
        real(kind = real64), allocatable :: F_y(:);
        real(kind = real64), allocatable :: F_z(:);
        !dir$ attributes align: 32 :: F_x, F_y, F_z

        ! torque vector
        real(kind = real64), allocatable :: T_x(:);
        real(kind = real64), allocatable :: T_y(:);
        real(kind = real64), allocatable :: T_z(:);
        !dir$ attributes align: 32 :: T_x, T_y, T_z

        ! displacement vector (the prefix `d' stands for delta or difference)
        real(kind = real64), allocatable :: dr_x(:);
        real(kind = real64), allocatable :: dr_y(:);
        real(kind = real64), allocatable :: dr_z(:);
        !dir$ attributes align: 32 :: dr_x, dr_y, dr_z

        ! particle ID array
        integer(kind = int32), allocatable :: ID(:);
        integer(kind = int32), allocatable :: padding(:);
        !dir$ attributes align: 32 :: ID, padding
    end type data

    private
    public data

end module dynamics

program alignment_test
    ! Minimalistic program to test the alignment of derived-types.
    use, intrinsic :: iso_fortran_env
    use :: iso_c_binding
    use dynamics
    implicit none

    ! particle data structure
    type(data), target :: pdata;

    ! number of particles in the system
    integer(kind = int32), parameter :: n_part = 64;

    ! captures the status returned by allocate/deallocate functions
    integer(kind = int32) :: alloc_stat;

    ! pointer to access the components of the data structure
    real(kind = real64), pointer, contiguous :: pr_x(:);
    real(kind = real64), pointer, contiguous :: pr_y(:);
    real(kind = real64), pointer, contiguous :: pr_z(:);

    real(kind = real64), pointer, contiguous :: pd_x(:);
    real(kind = real64), pointer, contiguous :: pd_y(:);
    real(kind = real64), pointer, contiguous :: pd_z(:);

    real(kind = real64), pointer, contiguous :: pf_x(:);
    real(kind = real64), pointer, contiguous :: pf_y(:);
    real(kind = real64), pointer, contiguous :: pf_z(:);

    real(kind = real64), pointer, contiguous :: pt_x(:);
    real(kind = real64), pointer, contiguous :: pt_y(:);
    real(kind = real64), pointer, contiguous :: pt_z(:);

    real(kind = real64), pointer, contiguous :: pdr_x(:);
    real(kind = real64), pointer, contiguous :: pdr_y(:);
    real(kind = real64), pointer, contiguous :: pdr_z(:);

    integer(kind = int32), pointer, contiguous :: pID(:);
    integer(kind = int32), pointer, contiguous :: ppadding(:);

    allocate( pdata %r_x(n_part), pdata %r_y(n_part), pdata %r_z(n_part),&
        pdata %d_x(n_part), pdata %d_y(n_part), pdata %d_z(n_part),&
        pdata %F_x(n_part), pdata %F_y(n_part), pdata %F_z(n_part),&
        pdata %T_x(n_part), pdata %T_y(n_part), pdata %T_z(n_part),&
        pdata %dr_x(n_part), pdata %dr_y(n_part), pdata %dr_z(n_part),&
        pdata %ID(n_part), pdata %padding(n_part), stat=alloc_stat );

    if ( alloc_stat /= 0 ) then
        stop "insufficient memory to allocate the data structure, program stopped"
    end if

    print *, 'sizeof(pdata) = ', c_sizeof(pdata);

    pr_x => pdata %r_x;
    pr_y => pdata %r_y;
    pr_z => pdata %r_z;

    pd_x => pdata %d_x;
    pd_y => pdata %d_y;
    pd_z => pdata %d_z;

    pf_x => pdata %f_x;
    pf_y => pdata %f_y;
    pf_z => pdata %f_z;

    pt_x => pdata %t_x;
    pt_y => pdata %t_y;
    pt_z => pdata %t_z;

    pdr_x => pdata %dr_x;
    pdr_y => pdata %dr_y;
    pdr_z => pdata %dr_z;

    pID => pdata %ID;

    ppadding => pdata %padding;

    ! assign pretend values to the components of the position vector of the particles

    !dir$ assume_aligned pr_x: 32
    pr_x = 0.0d+0;
    !dir$ assume_aligned pr_y: 32
    pr_y = 0.0d+0;
    !dir$ assume_aligned pr_z: 32
    pr_z = 0.0d+0;

    !dir$ assume_aligned pd_x: 32
    pd_x = 0.0d+0;
    !dir$ assume_aligned pd_y: 32
    pd_y = 0.0d+0;
    !dir$ assume_aligned pd_z: 32
    pd_z = 0.0d+0;

    !dir$ assume_aligned pf_x: 32
    pf_x = 0.0d+0;
    !dir$ assume_aligned pf_y: 32
    pf_y = 0.0d+0;
    !dir$ assume_aligned pf_z: 32
    pf_z = 0.0d+0;

    !dir$ assume_aligned pt_x: 32
    pf_x = 0.0d+0;
    !dir$ assume_aligned pt_y: 32
    pf_y = 0.0d+0;
    !dir$ assume_aligned pt_z: 32
    pf_z = 0.0d+0;

    !dir$ assume_aligned pdr_x: 32
    pdr_x = 0.0d+0;
    !dir$ assume_aligned pdr_y: 32
    pdr_y = 0.0d+0;
    !dir$ assume_aligned pdr_z: 32
    pdr_z = 0.0d+0;

    !dir$ assume_aligned pID: 32
    pID = 0;

    !dir$ assume_aligned ppadding: 32
    ppadding = 0;

    ! free structure from memory
    deallocate( pdata %r_x, pdata %r_y, pdata %r_z,&
        pdata %d_x, pdata %d_y, pdata %d_z,&
        pdata %F_x, pdata %F_y, pdata %F_z,&
        pdata %T_x, pdata %T_y, pdata %T_z,&
        pdata %dr_x, pdata %dr_y, pdata %dr_z,&
        pdata %ID, pdata %padding, stat=alloc_stat );
    if ( alloc_stat /= 0 ) then
        stop "unexpected error, failed deallocate the data structure..."
    end if

end program

Thanks!

Misael

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,027 Views

>> I figured that the compiler can still align the arrays inside the structure...

You have a basic misconception here that you need to correct. The arrays are NOT inside the structure. The allocatable array descriptors are inside the structure. The arrays themselves do not exist until they are allocated (some time after the program starts). Note, fixed arrays need no descriptor.

>>The current version of the code still contains the padding array

This is pointless due to the fact that you do not know the size of the array descriptors. These descriptors will change size between 32-bit and 64-bit builds, as well as are subject to change at anytime with compiler version changes/requirements. Getting the size of the structure at one point in time, then assuming it is the same as compiler versions change is walking on thin ice.

>>...verify alignment...

test the alignment against LOC(pdata%r_x(1)) or better LOC(pdata%r(lbound(pdata%r))) in the event that the lower bound of the array is not 1.

>>regarding your comment on passing the array pdata %r_x to a subroutine, would the !dir$ assume_aligned r_x: 32 would not help in this case (being r_x the corresponding argument of pdata %r_x in the subroutine)?

The assume_aligned is appropriate in the subroutine... it is your responsibility to assure that you pass in aligned data.

>>I tend to use pointers to access slices of the arrays

Then with respect to the prior >> note, it is your responsibility to assure the pointer points to aligned slice. With array of REAL(8), lbound of 1 and 32-byte alignment requirement this would have to be a multiple of 4+1, 64-byte alignment would be a multiple of 8+1. Formally:

(multiple of ((alignment size) / sizeof(array(lbound(array)))) + lbound(array)

Jim Dempsey

 

0 Kudos
Misael_D_
Beginner
1,027 Views

Thank Jim for your instructive remarks.  It is clearer to me now, thanks. 

 

0 Kudos
Reply