Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Restrict equivalent in fortran

gert_massa
Beginner
1,611 Views

Dear all,

I'm have rewritten a lot of our code from fortran 77 stype to fortran 2003. I'm making quite a lot of use the the array assignment with the "Array1(1:n) = Array2(1:n)" notation but appreantly it is using the stack create a temporary which causes stack overflows on bigger test cases. The compiler fails to see that no temporary array is needed for this simple statement probably because Array1 and Array2 are pointers. Basically I need to tell the compiler there two pointers are non-overlapping (like restrict does in C). Is this possible in fortran?

P.S. I use pointers everyware instead of allocateble because I need to be fully compatible with some legacy code and I need keep track of the memory consumption using special allocator object keeps a list of allocations and deallocated on the destructor of the allocator object (final subroutine).

0 Kudos
1 Solution
IanH
Honored Contributor II
1,611 Views

Perhaps this is daft for other reasons, but [I think] you get the effect of restrict if the pointer things are passed as arguments to a procedure with non-target, non-pointer dummy arguments that does the assignment.

That is, an approach would be to run around and do all the fancy pointer assignments in a top level procedure, and then pass the allocated things to worker procedures that do all the real work with non-pointers.

View solution in original post

0 Kudos
13 Replies
Steven_L_Intel1
Employee
1,611 Views

There is nothing exactly like "restrict" but the compiler does support the option /Qansi-alias (the name really comes from the C side) that can allow some additional optimization on pointers. I am not sure if it will help here.

0 Kudos
gert_massa
Beginner
1,611 Views

/Qansi-alias is enabled by default. So basically I have it replace all these array assignment by do loops?

0 Kudos
Steven_L_Intel1
Employee
1,611 Views

I see that the documentation says it is the default, but I don't think that's true. But I did an experiment and it did not seem to help in this case. Too bad you can't use allocatable.

But try this - add the CONTIGUOUS attribute to the POINTER declaration - that improves the code in my testing.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,611 Views

Steve,

CONTIGUOUS would not preclude overlapping regions, thus may affect optimizations, although a runtime test could fix this.

Any thoughts of adding a !DIR$ to declare following statement or loop has no aliasing issues?

!DIR$ CONTIGOUS NOALIAS
Array1(1:n) = Array2(1:n) ! arrays are pointers

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
1,611 Views

When you add CONTIGUOUS, we call a "fast_memcpy" routine to do the assignment rather than element-by-element. This can check for overlap and do the move in the necessary direction. I am not 100% certain it avoids the tenp but it might. Is it possible to remove the (1:N) in the subscript? If you're moving the whole array, you don't need that.

I will pass on your suggestion for a NOALIAS attribute for pointers.

0 Kudos
TimP
Honored Contributor III
1,611 Views

What reason did vec-report or opt-report give for not vectorizing?  How did it compare with use of !dir$ simd (which prevents the compiler from substituting fast_memcpy)?

simd implies ignoring possibility of overlap.  !dir$ ivdep is a less aggressive way to assert no overlap.

0 Kudos
IanH
Honored Contributor II
1,612 Views

Perhaps this is daft for other reasons, but [I think] you get the effect of restrict if the pointer things are passed as arguments to a procedure with non-target, non-pointer dummy arguments that does the assignment.

That is, an approach would be to run around and do all the fancy pointer assignments in a top level procedure, and then pass the allocated things to worker procedures that do all the real work with non-pointers.

0 Kudos
gert_massa
Beginner
1,611 Views

I can not remove the (1:N) because the Array1 is bigger then Array2. Basically my code is increasing the size of my arrays. The CONTIGUOUS attribute nor the CONTIGUOUS NOALIAS directive do not seem to work in my case.

vec-report:2 gives me the folowing output for this single line

 remark: LOOP WAS VECTORIZED
 remark: loop was not vectorized: vectorization possible but seems inefficient
 remark: LOOP WAS VECTORIZED
 remark: loop was not vectorized: not inner loop

0 Kudos
gert_massa
Beginner
1,611 Views

Creating a subroutine VectorCopy does solver my problem. I can even pass non contiguous subarray to the subroutine call without any problem. So no copy on the stack is done for a subroutine call I hope it won't create a copy on the heap instead. Is this correct?

call VectorCopy(Array1(1:n,1:m), Array2(1:n,1:m) ! with n < size(Array1,1)

P.S. is it possible to override the intrinsic assignment operators by my functions?

0 Kudos
John_Campbell
New Contributor II
1,611 Views

Could you try something like "Array1(1:n) = Array2(1:n)" becomes "call move_vector (Array1(1), Array2(1), n)" and

[fortran]   subroutine move_vector (a,b,n)
  integer n
  real*8 a(n), b(n)
  b = a
end subroutine move_vector
[/fortran]

It is Fortran 77 style and should not use the stack. I'm not sure if the subroutine would be optimised.
I'm assuming A and B are typical contiguous arrays.
If Array1 and Array2 overlap in memory, then you could test the start address of A and B and use a reverse DO loop if LOC(A) < LOC(B)

John

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,611 Views

Gert,

For statements like your equate, when a call to Intel's "fast_memcpy" is substituted, the compiler will not report that the code was vectorized.

IOW do not assume lack of vectorization is lack of optimization.

Jim Dempsey 

0 Kudos
TimP
Honored Contributor III
1,611 Views

John Campbell wrote:

Could you try something like "Array1(1:n) = Array2(1:n)" becomes "call move_vector (Array1(1), Array2(1), n)" and

   subroutine move_vector (a,b,n)   integer n   real*8 a(n), b(n)   b = a end subroutine move_vector

It is Fortran 77 style and should not use the stack. I'm not sure if the subroutine would be optimised.
I'm assuming A and B are typical contiguous arrays.
If Array1 and Array2 overlap in memory, then you could test the start address of A and B and use a reverse DO loop if LOC(A) < LOC(B)

John

I'm missing part of your point, since whole-array assignment surely can't be called "Fortran 77 style."  Without a pointer declaration, Fortran does require that the arguments of a subroutine don't overlap.  ifort has an option to discard that provision of the standard (assume:dummy_aliases).

ifort and gfortran don't support reversing loops for vectorization in presence of data overlap.  Oracle Fortran, and perhaps Open64, does (probably requiring some effort with c_loc, as you suggested). ifort treats LOC and c_loc as nearly interchangeable, but as you seem to refer to capabilities of other compilers, you should adhere to f2003.  memmove(), according to the C definition, is usually built so as to do that.  memmove() can be called from Fortran using iso_C_binding.  That won't get you the Intel library version (unless you call it by the Internal Intel name).

0 Kudos
John_Campbell
New Contributor II
1,611 Views

Tim,

I thought the problem that was identified was that a temporary copy of the array sections were being placed on the stack, causing a stack overflow. The point of my suggestion was that:
1) the call move_vector (Array1(1), Array2(1), n) should not create a temporary copy, while
2) the subroutine, using the F77 approach to transfer the array dimensions, should be sufficient to allow ifort to consider arrays a and b as sized arrays and utilise instructions suitable for efficient transfers.

My comment about using LOC was in response to the original post reference to non-overlapping, as if this was not the case then a LOC test could also easily cover this possibility and make a more robust solution.

In the past, I have found the creation of temporary copies of array sections to cause this problem and also slow down the execution, where I expected they would not be required.

John

0 Kudos
Reply