- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greetings,
I am developing a program to study the dynamics of a system of particles in a cubic box with periodic boundary conditions. I would like to take advantage of auto-vectorization to expedite computations. To that end, I have created a data structure for storing the position, orientation, force, and torque vectors (among quantities of interest).
If I understood correctly the documentation on auto-vectorization, I can request the compiler to align the data structure (or derived-type) to a 32-bit boundary and use the sequence attribute to pack the components of the structure (no-padding). I am interested in aligning data to 32-bit boundaries since I am targeting the program to run in AVX processors. The problem that I am facing is that none of the components becomes aligned despite the fact that the data structure is requested to be aligned to a 32-bit boundary.
To ease the troubleshooting, I am posting a minimalistic version of the code here (with comments for clarity of the reader):
module dynamics ! Description: ! Defines the principal data structure that stores information about the particles ! in the system. To take advantage of auto-vectorization the data is organized ! as a Structure of Arrays (SoA). use, intrinsic :: iso_fortran_env implicit none type data sequence ! position vector real(kind = real64), allocatable :: r_x(:); real(kind = real64), allocatable :: r_y(:); real(kind = real64), allocatable :: r_z(:); ! orientation vector (director) real(kind = real64), allocatable :: d_x(:); real(kind = real64), allocatable :: d_y(:); real(kind = real64), allocatable :: d_z(:); ! force vector real(kind = real64), allocatable :: F_x(:); real(kind = real64), allocatable :: F_y(:); real(kind = real64), allocatable :: F_z(:); ! torque vector real(kind = real64), allocatable :: T_x(:); real(kind = real64), allocatable :: T_y(:); real(kind = real64), allocatable :: T_z(:); ! displacement vector (the prefix `d' stands for delta or difference) real(kind = real64), allocatable :: dr_x(:); real(kind = real64), allocatable :: dr_y(:); real(kind = real64), allocatable :: dr_z(:); ! particle ID array integer(kind = int32), allocatable :: ID(:); ! padding array, such that we have an equivalent of 16 arrays of 64 bits each integer(kind = int32), allocatable :: padding(:); end type data private public data end module dynamics program alignment_test ! Minimalistic program to test the alignment of derived-types. use, intrinsic :: iso_fortran_env use dynamics implicit none ! particle data structure, aligned to 32-bits type(data), target :: pdata; !dir$ attributes align: 32 :: pdata ! number of particles in the system integer(kind = int32), parameter :: n_pdata = 64; ! captures the status returned by allocate/deallocate functions integer(kind = int32) :: alloc_stat; ! pointer to access the components of the data structure real(kind = real64), pointer, contiguous :: pr_x(:); real(kind = real64), pointer, contiguous :: pr_y(:); real(kind = real64), pointer, contiguous :: pr_z(:); allocate( pdata %r_x(n_pdata), pdata %r_y(n_pdata), pdata %r_z(n_pdata),& pdata %d_x(n_pdata), pdata %d_y(n_pdata), pdata %d_z(n_pdata),& pdata %F_x(n_pdata), pdata %F_y(n_pdata), pdata %F_z(n_pdata),& pdata %T_x(n_pdata), pdata %T_y(n_pdata), pdata %T_z(n_pdata),& pdata %dr_x(n_pdata), pdata %dr_y(n_pdata), pdata %dr_z(n_pdata),& pdata %ID(n_pdata), pdata %padding(n_pdata), stat=alloc_stat ); if ( alloc_stat /= 0 ) then stop "insufficient memory to allocate the data structure, program stopped" end if pr_x => pdata %r_x; pr_y => pdata %r_y; pr_z => pdata %r_z; ! assign pretend values to the components of the position vector of the particles !dir$ assume_aligned pr_x: 32 pr_x = 0.0d+0; !dir$ assume_aligned pr_y: 32 pr_y = 0.0d+0; !dir$ assume_aligned pr_z: 32 pr_z = 0.0d+0; ! free structure from memory deallocate( pdata %r_x, pdata %r_y, pdata %r_z,& pdata %d_x, pdata %d_y, pdata %d_z,& pdata %F_x, pdata %F_y, pdata %F_z,& pdata %T_x, pdata %T_y, pdata %T_z,& pdata %dr_x, pdata %dr_y, pdata %dr_z,& pdata %ID, pdata %padding, stat=alloc_stat ); if ( alloc_stat /= 0 ) then stop "unexpected error, failed deallocate the data structure..." end if end program
The program was compiled in the following manner:
ifort -g -traceback -check all -align nosequence -O0 alignment_test.f90
Here is the output generated at runtime:
forrtl: severe (408): fort: (28): Check for ASSUME_ALIGNED fails for 'PR_X' in routine 'ALIGNMENT_TEST' at line 77.
Image PC Routine Line Source
a.out 0000000000407786 Unknown Unknown Unknown
a.out 000000000040454F MAIN__ 77 alignment_test.f90
a.out 0000000000402F1E Unknown Unknown Unknown
libc.so.6 0000003B1F81ED5D Unknown Unknown Unknown
a.out 0000000000402E29 Unknown Unknown Unknown
and the version of the Fortran compiler is the following:
ifort --version
ifort (IFORT) 16.0.3 20160415
Copyright (C) 1985-2016 Intel Corporation. All rights reserved.
Thanks in advance for your help,
Misael
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All of the fields in your structure DATA are descriptors, not the actual allocated data.
When you declare "pdata" to be declared on a 32-byte boundary, again, it does not affect the actual allocated data, just where the DATA structure itself is aligned.
To do what you want, please use the -align:array32byte command line switch.
--Lorri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Lorri for your reply, I have recompiled the code (unchanged) with the following options:
ifort -g -traceback -check all -align array32byte alignment_test.f90
however, the problem persists:
forrtl: severe (408): fort: (28): Check for ASSUME_ALIGNED fails for 'PR_X' in routine 'ALIGNMENT_TEST' at line 77.
Image PC Routine Line Source
a.out 0000000000407786 Unknown Unknown Unknown
a.out 000000000040454F MAIN__ 77 alignment_test.f90
a.out 0000000000402F1E Unknown Unknown Unknown
libc.so.6 0000003B1F81ED5D Unknown Unknown Unknown
a.out 0000000000402E29 Unknown Unknown Unknown
I still cannot figure out why I don't get the expected alignment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lorri's perceptive remark suggests to me that you may wish to reconsider the design of type(data).
In particular, the padding is not working as you intended. To see this, print c_sizeof(pdata) with and without the last item, "padding". You will observe that without the padding you have a structure containing 16 descriptors (and occupying 576 or 1152 bytes, depending on whether you are targeting IA32 or X64), whereas with the padding you have a structure containing 17 descriptors and occupying 612 or 1224 bytes.
Without the padding, the byte counts above, 576 or 1152, are divisible by 32, your preferred alignment. With the padding, the counts, 612 or 1224, are not divisible by 32.
See the following article for more details of the Fortran array descriptor: https://software.intel.com/en-us/node/525356 .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The specification of the alignment for the allocatable array components could also go inside the type definition itself.
Here's a variation that uses parameterised derived types, where the componenent layout is perhaps consistent with what the OP originally had in mind, and which lets you get away with a much shorter allocate statement. Yay for type parameters.
But running this on x64 windows the attributes align:32 specification for the pdata object doesn't appear to work - it appears to only be 16 byte aligned. Is this as expected?
module dynamics ! Description: ! Defines the principal data structure that stores information about the particles ! in the system. To take advantage of auto-vectorization the data is organized ! as a Structure of Arrays (SoA). use, intrinsic :: iso_fortran_env implicit none type data(l) integer, len :: l ! position vector real(kind = real64) :: r_x(l) real(kind = real64) :: r_y(l) real(kind = real64) :: r_z(l) ! orientation vector (director) real(kind = real64) :: d_x(l) real(kind = real64) :: d_y(l) real(kind = real64) :: d_z(l) ! force vector real(kind = real64) :: F_x(l) real(kind = real64) :: F_y(l) real(kind = real64) :: F_z(l) ! torque vector real(kind = real64) :: T_x(l) real(kind = real64) :: T_y(l) real(kind = real64) :: T_z(l) ! displacement vector (the prefix `d' stands for delta or difference) real(kind = real64) :: dr_x(l) real(kind = real64) :: dr_y(l) real(kind = real64) :: dr_z(l) ! particle ID array integer(kind = int32) :: ID(l) ! padding array, such that we have an equivalent of 16 arrays of 64 bits each integer(kind = int32) :: padding(l) end type data private public data end module dynamics program alignment_test ! Minimalistic program to test the alignment of derived-types. use, intrinsic :: iso_fortran_env use dynamics implicit none ! particle data structure, aligned to 32-bits (I think you mean bytes). type(data(:)), allocatable, target :: pdata !dir$ attributes align: 32 :: pdata ! number of particles in the system integer(kind = int32), parameter :: n_pdata = 64 ! captures the status returned by allocate/deallocate functions integer(kind = int32) :: alloc_stat ! pointer to access the components of the data structure real(kind = real64), pointer, contiguous :: pr_x(:) real(kind = real64), pointer, contiguous :: pr_y(:) real(kind = real64), pointer, contiguous :: pr_z(:) allocate( data(n_pdata) :: pdata, stat=alloc_stat ) if ( alloc_stat /= 0 ) then stop "insufficient memory to allocate the data structure, program stopped" end if pr_x => pdata %r_x pr_y => pdata %r_y pr_z => pdata %r_z print "(*(z16.16,1x,i0,:,/))", & loc(pdata), mod(loc(pdata), 32), & loc(pdata%r_x), mod(loc(pdata%r_x), 32), & loc(pdata%r_y), mod(loc(pdata%r_y), 32), & loc(pr_x), mod(loc(pr_x), 32) print "(*(i0,t10,i0,:,/))", & loc(pdata%r_x) - loc(pdata), mod(loc(pdata%r_x) - loc(pdata), 32), & loc(pdata%r_y) - loc(pdata), mod(loc(pdata%r_y) - loc(pdata), 32) ! assign pretend values to the components of the position vector of the particles !dir$ assume_aligned pr_x: 32 pr_x = 0.0d+0 !dir$ assume_aligned pr_y: 32 pr_y = 0.0d+0 !dir$ assume_aligned pr_z: 32 pr_z = 0.0d+0 ! free structure from memory deallocate( pdata, stat=alloc_stat ) if ( alloc_stat /= 0 ) then stop "unexpected error, failed deallocate the data structure..." end if print *, 'done' end program
For assumed_aligned, the docs say "If address ... has the POINTER attribute, it is the POINTER and not the pointee or the TARGET that is assumed aligned." The runtime message that I get when the target is not 32 byte aligned isn't consistent with that sentence.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use:
!DIR$ ATTRIBUTES ALIGN : alignment :: variableList
type data sequence ! position vector real(kind = real64), allocatable :: r_x(:); real(kind = real64), allocatable :: r_y(:); real(kind = real64), allocatable :: r_z(:); !dir$ attributes align : 32 :: r_x, r_y, r_z ! orientation vector (director) real(kind = real64), allocatable :: d_x(:); real(kind = real64), allocatable :: d_y(:); real(kind = real64), allocatable :: d_z(:); !dir$ attributes align : 32 :: d_x, d_y, d_z ! force vector real(kind = real64), allocatable :: F_x(:); real(kind = real64), allocatable :: F_y(:); real(kind = real64), allocatable :: F_z(:); !dir$ attributes align : 32 :: F_x, F_y, F_z ! torque vector real(kind = real64), allocatable :: T_x(:); real(kind = real64), allocatable :: T_y(:); real(kind = real64), allocatable :: T_z(:); !dir$ attributes align : 32 :: T_x, T_y, T_z ! displacement vector (the prefix `d' stands for delta or difference) real(kind = real64), allocatable :: dr_x(:); real(kind = real64), allocatable :: dr_y(:); real(kind = real64), allocatable :: dr_z(:); !dir$ attributes align : 32 :: dr_x, dr_y, dr_z ! particle ID array integer(kind = int32), allocatable :: ID(:); !dir$ attributes align : 32 :: ID end type data
**** Note, the alignment will work on the allocatables. However, if those arrays were of fixed size then the alignment is not (necessarily) attainable because the placement of the object of type data is unknown. You could attain alignment in this case (fixed arrays) by attributing each instance of variables of type data for desired alignment, and then using SEQUENCE and internal padding.
Note 2, please consider using alignment to cache line size (64 bytes), as later generation processors will be using 512-bit vectors.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
BTW
If you directly use pdata%r_x the compiler will know r_x is aligned. However, if you pass pdata%r_x to a subroutine as a rank 1 array, the subroutine will not necessarily know the dummy argument is aligned (it will if it gets inlined). Using a pointer (pr_x=>pdata%r_x) is not necessarily going to know that pr_x points to aligned data (this may depend on if the compiler can see all possible ways of association).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all for your instructive remarks.
I have removed the sequence attribute from the derived-type, I figured that the compiler can still align the arrays inside the structure even when the length of each array is not a factor of 32 by means of automatic padding, I suppose. Have also added the compiler directives to the code for the sake of being more explicit, which might be useful for the next developer that will work on it.
The current version of the code still contains the padding array to demonstrate that even though the size of pdata is 1224 bytes (not divisible by 32), all the arrays are still aligned to 32-bytes. Each array passed the runtime checks for assume aligned. If I interpret things correctly, c_sizeof(pdata) is only reporting the total byte count of the descriptors, not the actual arrays, and therefore, it cannot be used as a measure to determine proper alignment. Nevertheless, thanks mecej4 for bringing this to my attention since it allowed me to fully understand Lorri's remarks.
I tried lanH suggestion, since it looks more elegant to deal with derived-types this way, however, the arrays did not passed the runtime checks for alignments. I think that this finding is in agreement with Jim's remarks on the alignment of fixed-size arrays (placement of the data structure being unknown). Thanks Jim for the suggestion to increase the alignment to 64 bytes.
Jim, regarding your comment on passing the array pdata %r_x to a subroutine, would the !dir$ assume_aligned r_x: 32 would not help in this case (being r_x the corresponding argument of pdata %r_x in the subroutine)?
I tend to use pointers to access slices of the arrays because the compiler sometimes reports that it cannot vectorize it, on the other hand it works as expected with pointers (but I guess this is more suitable for a new thread).
I am posting the final version of the code here, since it might be of interest to someone else:
module dynamics ! Description: ! Defines the principal data structure that stores information about the particles ! in the system. To take advantage of auto-vectorization the data is organized ! as a Structure of Arrays (SoA). use, intrinsic :: iso_fortran_env implicit none type data ! position vector real(kind = real64), allocatable :: r_x(:); real(kind = real64), allocatable :: r_y(:); real(kind = real64), allocatable :: r_z(:); !dir$ attributes align: 32 :: r_x, r_y, r_z ! orientation vector (director) real(kind = real64), allocatable :: d_x(:); real(kind = real64), allocatable :: d_y(:); real(kind = real64), allocatable :: d_z(:); !dir$ attributes align: 32 :: d_x, d_y, d_z ! force vector real(kind = real64), allocatable :: F_x(:); real(kind = real64), allocatable :: F_y(:); real(kind = real64), allocatable :: F_z(:); !dir$ attributes align: 32 :: F_x, F_y, F_z ! torque vector real(kind = real64), allocatable :: T_x(:); real(kind = real64), allocatable :: T_y(:); real(kind = real64), allocatable :: T_z(:); !dir$ attributes align: 32 :: T_x, T_y, T_z ! displacement vector (the prefix `d' stands for delta or difference) real(kind = real64), allocatable :: dr_x(:); real(kind = real64), allocatable :: dr_y(:); real(kind = real64), allocatable :: dr_z(:); !dir$ attributes align: 32 :: dr_x, dr_y, dr_z ! particle ID array integer(kind = int32), allocatable :: ID(:); integer(kind = int32), allocatable :: padding(:); !dir$ attributes align: 32 :: ID, padding end type data private public data end module dynamics program alignment_test ! Minimalistic program to test the alignment of derived-types. use, intrinsic :: iso_fortran_env use :: iso_c_binding use dynamics implicit none ! particle data structure type(data), target :: pdata; ! number of particles in the system integer(kind = int32), parameter :: n_part = 64; ! captures the status returned by allocate/deallocate functions integer(kind = int32) :: alloc_stat; ! pointer to access the components of the data structure real(kind = real64), pointer, contiguous :: pr_x(:); real(kind = real64), pointer, contiguous :: pr_y(:); real(kind = real64), pointer, contiguous :: pr_z(:); real(kind = real64), pointer, contiguous :: pd_x(:); real(kind = real64), pointer, contiguous :: pd_y(:); real(kind = real64), pointer, contiguous :: pd_z(:); real(kind = real64), pointer, contiguous :: pf_x(:); real(kind = real64), pointer, contiguous :: pf_y(:); real(kind = real64), pointer, contiguous :: pf_z(:); real(kind = real64), pointer, contiguous :: pt_x(:); real(kind = real64), pointer, contiguous :: pt_y(:); real(kind = real64), pointer, contiguous :: pt_z(:); real(kind = real64), pointer, contiguous :: pdr_x(:); real(kind = real64), pointer, contiguous :: pdr_y(:); real(kind = real64), pointer, contiguous :: pdr_z(:); integer(kind = int32), pointer, contiguous :: pID(:); integer(kind = int32), pointer, contiguous :: ppadding(:); allocate( pdata %r_x(n_part), pdata %r_y(n_part), pdata %r_z(n_part),& pdata %d_x(n_part), pdata %d_y(n_part), pdata %d_z(n_part),& pdata %F_x(n_part), pdata %F_y(n_part), pdata %F_z(n_part),& pdata %T_x(n_part), pdata %T_y(n_part), pdata %T_z(n_part),& pdata %dr_x(n_part), pdata %dr_y(n_part), pdata %dr_z(n_part),& pdata %ID(n_part), pdata %padding(n_part), stat=alloc_stat ); if ( alloc_stat /= 0 ) then stop "insufficient memory to allocate the data structure, program stopped" end if print *, 'sizeof(pdata) = ', c_sizeof(pdata); pr_x => pdata %r_x; pr_y => pdata %r_y; pr_z => pdata %r_z; pd_x => pdata %d_x; pd_y => pdata %d_y; pd_z => pdata %d_z; pf_x => pdata %f_x; pf_y => pdata %f_y; pf_z => pdata %f_z; pt_x => pdata %t_x; pt_y => pdata %t_y; pt_z => pdata %t_z; pdr_x => pdata %dr_x; pdr_y => pdata %dr_y; pdr_z => pdata %dr_z; pID => pdata %ID; ppadding => pdata %padding; ! assign pretend values to the components of the position vector of the particles !dir$ assume_aligned pr_x: 32 pr_x = 0.0d+0; !dir$ assume_aligned pr_y: 32 pr_y = 0.0d+0; !dir$ assume_aligned pr_z: 32 pr_z = 0.0d+0; !dir$ assume_aligned pd_x: 32 pd_x = 0.0d+0; !dir$ assume_aligned pd_y: 32 pd_y = 0.0d+0; !dir$ assume_aligned pd_z: 32 pd_z = 0.0d+0; !dir$ assume_aligned pf_x: 32 pf_x = 0.0d+0; !dir$ assume_aligned pf_y: 32 pf_y = 0.0d+0; !dir$ assume_aligned pf_z: 32 pf_z = 0.0d+0; !dir$ assume_aligned pt_x: 32 pf_x = 0.0d+0; !dir$ assume_aligned pt_y: 32 pf_y = 0.0d+0; !dir$ assume_aligned pt_z: 32 pf_z = 0.0d+0; !dir$ assume_aligned pdr_x: 32 pdr_x = 0.0d+0; !dir$ assume_aligned pdr_y: 32 pdr_y = 0.0d+0; !dir$ assume_aligned pdr_z: 32 pdr_z = 0.0d+0; !dir$ assume_aligned pID: 32 pID = 0; !dir$ assume_aligned ppadding: 32 ppadding = 0; ! free structure from memory deallocate( pdata %r_x, pdata %r_y, pdata %r_z,& pdata %d_x, pdata %d_y, pdata %d_z,& pdata %F_x, pdata %F_y, pdata %F_z,& pdata %T_x, pdata %T_y, pdata %T_z,& pdata %dr_x, pdata %dr_y, pdata %dr_z,& pdata %ID, pdata %padding, stat=alloc_stat ); if ( alloc_stat /= 0 ) then stop "unexpected error, failed deallocate the data structure..." end if end program
Thanks!
Misael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> I figured that the compiler can still align the arrays inside the structure...
You have a basic misconception here that you need to correct. The arrays are NOT inside the structure. The allocatable array descriptors are inside the structure. The arrays themselves do not exist until they are allocated (some time after the program starts). Note, fixed arrays need no descriptor.
>>The current version of the code still contains the padding array
This is pointless due to the fact that you do not know the size of the array descriptors. These descriptors will change size between 32-bit and 64-bit builds, as well as are subject to change at anytime with compiler version changes/requirements. Getting the size of the structure at one point in time, then assuming it is the same as compiler versions change is walking on thin ice.
>>...verify alignment...
test the alignment against LOC(pdata%r_x(1)) or better LOC(pdata%r(lbound(pdata%r))) in the event that the lower bound of the array is not 1.
>>regarding your comment on passing the array pdata %r_x to a subroutine, would the !dir$ assume_aligned r_x: 32 would not help in this case (being r_x the corresponding argument of pdata %r_x in the subroutine)?
The assume_aligned is appropriate in the subroutine... it is your responsibility to assure that you pass in aligned data.
>>I tend to use pointers to access slices of the arrays
Then with respect to the prior >> note, it is your responsibility to assure the pointer points to aligned slice. With array of REAL(8), lbound of 1 and 32-byte alignment requirement this would have to be a multiple of 4+1, 64-byte alignment would be a multiple of 8+1. Formally:
(multiple of ((alignment size) / sizeof(array(lbound(array)))) + lbound(array)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank Jim for your instructive remarks. It is clearer to me now, thanks.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page