- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
DO i = N1-3, N2+3 R = Q1(i,5) RRq = one / Q1(i,5) Ux = Q1(i,1) * RRq Uy = Q1(i,2) * RRq Uz = Q1(i,3) * RRq Un = Ux * riX + Uy * riY + Uz * riZ P = Gm1 * (Q1(i,4) - half * R * (Ux*Ux + Uy*Uy + Uz*Uz) ) Fc(i,1) = R * Ux * Un + P * riX Fc(i,2) = R * Uy * Un + P * riY Fc(i,3) = R * Uz * Un + P * riZ Fc(i,4) = (Q1(i,4) + P) * Un Fc(i,5) = R * Un END DOArrays Q1 and Fc are dynamically allocated at runtime they are declared in modules, like
!dir$ attributes align:64 :: Fc, Q1 Real(8), Dimension(:,:), Allocatable :: Fc Real(8), Dimension(:,:), Allocatable :: Q1Allocation has the form
ALLOCATE (Fc(-2:Nmax+3,5), Q1(-2:Nmax+3,5), ...)The compilation sequence is :
mpiifort -xHost -O3 -inline-forceinline -pad -opt-prefetch -mp1 -ftz -unroll-aggressive -132 -module ./obj_O3 -I./obj_O3 -I. -implicitnone -traceback -g -sox -fpp -vec-report6 -c ./SRC/flux.f -o./obj_O3/flux.oI know that data alignement is important in vectorization so I'd like to know if there is something that could be done to improve this ? Thank you for your advices. Regards, Guy.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your allocation places Fc(-2, 1) at a byte location that is a multiple of 64 bytes
Your DO loop begins at N1-3.
How is the compiler to know that N1-3 == -2? (always)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jim,
The compiler can't know this. In fact this is always true.
Could a directive like
!DIR$ VECTOR ALIGNED
be useful in this case ?
Regards,
Guy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The last line in the report output you show states "LOOP WAS VECTOREIZED"
For loops like this compiler will (can) generate both a vectorized and non-vectorized loops as well as a section of code called "peel" (for some loops the loop promising for vectorization) that processes the loop in scalar mode up until the data is known to be aligned, then continues processing the loop with vectorized code, and then finally, when there is a remainder that is unaligned, it will have code that processes the remainder in scalar mode.
In looking at your allocation you have
ALLOCATE (Fc(-2:Nmax+3,5), Q1(-2:Nmax+3,5), ...)
When you allocate multi-dimensioned arrays, it is advantageous to make sure your first dimension (Fortran) is a multiple of variables of that type that fit within a vector. This may require you adding pad data to that dimension. Before you do this in all places in your programs, it would be beneficial to experiment with a loop such as above, with and without the pad. The peel code is fairly efficient, you may find that for large Nmax, the overhead for peel will not be noticed.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vector aligned directive asserts that each rank of a multi rank array is aligned, suggesting the compiler not to take care of the case where it is not. if the loop already reports vectorized, you may not see an advantage in this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
thank you for all your advices.
I have written several versions of the application, regarding the order of the dimensions in the most important arrays. I have modified the way their first dimension is declared : to be a multiple of 8 elements (DP floats).
Now I have chosen the order that give me best performances. And every thing fine for this.
Regards,
Guy.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page