Array Vs pointer and loop parameter optimization

Patrice_l_ · ‎05-26-2015

Hi,

I have an optimization question regarding the following code, that does a 'findloc', It appears that for ifort the do loop parameter change the optimization, and passing an pointer instead of the array lead to better optimization too. For ifort, the loop parameters degrades the performances, and not in gfortran. Both gives better performances with a pointer than an array with the target attribute, but removing the target attribute gfortran gives the same results as the pointer and still no effect of the loop parameters, where ifort is unaffected by the target attributes. I don't know if there is an optimization issue here, but I'd appreciate a discussion on this. The results are obtained with -O3 -xHost.

Thanks.

Time      |   array, 1:size     |  array st:end:step         | pointer 1:size       | pt st:end:step  |
  ifort   |      12.56          |        20.52               |   4.26               |     18.36       |   
gfortran  |       20.34         |        20.56               |          13.7        |     13.7        |  


Without the target on the array : 

Time      |   array, 1:size  |  array st:end:step |
  ifort   |      12.64          |        20.59               |
gfortran  |       13.74         |        13.74               |

program foo
implicit none
integer,dimension(:),allocatable,target :: array
integer :: i,val,k,tmp(1:2)
integer,dimension(:),pointer :: pt_marker
double precision :: t0,t1
allocate(array(20000000))
do i=1,20000000
  array(i)=i
end do
tmp=[1,20000000]
pt_marker=>array(tmp(1):tmp(2))
call cpu_time(t0)
do i=1,100000
  val=9*i
  call t(pt_marker,val,k,.false.)
end do
call cpu_time(t1)
print *,'time pt',t1-t0
call cpu_time(t0)
do i=1,100000
  val=9*i
  call t(array,val,k,.false.)
end do
call cpu_time(t1)
print *,'time array',t1-t0
print *,val,k
deallocate(array)
contains
subroutine t(array,value,j,bac2) ! result(j)      
integer(kind=4),dimension(:) :: array
logical,optional :: bac2
logical :: bac
integer(kind=4) :: i,value,j,st,en,step
bac=.false.
if(present(bac2)) bac=bac2
if(bac) then
        en=1
        st=size(array,1)
        step=-1
else
        st=1
        en=size(array,1)
        step=1
end if
do i=st,en,step !1,size(array,1)
!do i=1,size(array,1)
  if(array(i) .eq. value) then
        j=i 
        exit
  end if
end do
end subroutine


end program

jimdempseyatthecove · ‎05-26-2015

When performing timing experiments, be sure that all timed sections have "even footing".

In your above listed program the first loop that initialized the data may have used streaming stores. This results in the array data (or last LL cache size of data) not residing in cache.

This in turn, results in the first timed loop not having the advantage of data held in cache. And the second loop having the benefit of the first loops overhead of loading cache.

To correct for this, create an additional loop around you two timed sections. Run this loop for at least two iterations, a few more is better. Discard the timed information from the first iteration.

You will want to use more than two iterations. Then after discarding the first run of each timed section, either take the average or best times (your preference).

Jim Dempsey

Patrice_l_ · ‎05-26-2015

Thanks for the comment, in this case it does not change anything. The array and the loop count are big enough (i guess).

Martyn_C_Intel · ‎05-27-2015

In general, you'd be better to code two separate loops, one forward (no stride) and one backward (stride -1), rather than a single loop with unknown stride.

The loop isn't directly vectorizable, since it contains an EXIT statement.

Don't understand the observation for pointers compared to arrays. I'll note, though, that the compiler doesn't know whether an assumed shape array argument will be contiguous, and must allow for that.

Fortran 2008 contains a FINDLOC intrinsic, but I'm not sure if it's implemented yet in ifort, I couldn't spot it in the documentation.

Patrice_l_ · ‎05-27-2015

The findloc is not yet implemented in either compiler I have.

I came to the same conclusion regarding producing two loops, but gfortran does not seem to be affected by the loop parameters. I don't remember that contiguous keyword change the results, and I would expect the compiler to be even more careful with a pointer than with assumed shaped array, especially which optimization is possible with a pointer and not with assumed shape arrays.