I'm having some difficulty getting some alignment between fortran modules (in particular of a 2D array) to be recognised during vectorisation by the compiler.
In module neighbours I declare the following array in the module:
!> Linked cell list Integer( Kind = wi ), Allocatable, Public :: list(:,:) !DIR$ ATTRIBUTES ALIGN:64 :: list
and it is allocated with:
Subroutine init_list(neigh,mxatdm) Class( neighbours_type ) :: neigh Integer( Kind = wi ), Intent( In ) :: mxatdm Integer :: size_list size_list = neigh%max_list + 16-Mod(neigh%max_list,16) Allocate (neigh%list(-15:neigh%max_list,1:mxatdm)) !DIR$ ASSUME_ALIGNED neigh%list(-15,1) : 64
The code uses some of the negative indices, and the main indices start at 1 (for vectorised section of loop). As Kind=wi results in 4 byte integers this should result in alignment being correct I think, and with -check assume, there is no error here.
In another module, this data is then used in a strip-mined loop:
Do m=1,loop,STRIP_WIDTH !DIR$ NOFUSION Do n=0,STRIP_WIDTH-1,1 t_ll(n) = -1 t_kk(n) = -1 ! atomic and type indices !DIR$ ASSUME_ALIGNED neigh%list(1,iatm) : 64 jatm=neigh%list(m+n,iatm) aj=ltype(jatm)
Since its a 2D array, i believe I need to tell the compiler that each row is aligned, and I hoped I could do it in this way, but I'm not sure I can. Either way, this ASSUME_ALIGNED fails with -check assume:
forrtl: severe (408): fort: (28): Check for ASSUME_ALIGNED fails for 'NEIGH' in routine 'RDF_COLLECT' at line 188.
Without this ASSUME_ALIGNED the optimisation report believes neigh%list is not aligned (though it gets confused and believes the variable name is loop - seperate issue however). With the ASSUME_ALIGNED the compiler believes neigh%list is aligned, however it breaks at runtime. This is compiled with Intel 18 update 3 with flags -g -O3 -qopt-report=5 -xCOMMON-AVX512
What is the correct way to tell the compiler of the alignment of a 2D array that exists inside another module? Inside the neighbours module should I do
!DIR$ ASSUME_ALIGNED neigh%list(-15,1) : 64 !DIR$ ASSUME Mod(size_list,16) == 0
or will that not be passed between modules?
- Parallel Computing
size_list = neigh%max_list + 16-Mod(neigh%max_list,16) Allocate (neigh%list(-15:size_list,1:mxatdm)) ! ^^^^^^^^^
Does that help?
IOW you wanted the first dimension to be a multiple of 16.
Normally, of course, you would set compile option -align:array32byte (or larger) to tell the compiler to align as many arrays as possible, although that shouldn't be necessary for allocatable arrays.