- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have an optimization question regarding the following code, that does a 'findloc', It appears that for ifort the do loop parameter change the optimization, and passing an pointer instead of the array lead to better optimization too. For ifort, the loop parameters degrades the performances, and not in gfortran. Both gives better performances with a pointer than an array with the target attribute, but removing the target attribute gfortran gives the same results as the pointer and still no effect of the loop parameters, where ifort is unaffected by the target attributes. I don't know if there is an optimization issue here, but I'd appreciate a discussion on this. The results are obtained with -O3 -xHost.
Thanks.
Time | array, 1:size | array st:end:step | pointer 1:size | pt st:end:step | ifort | 12.56 | 20.52 | 4.26 | 18.36 | gfortran | 20.34 | 20.56 | 13.7 | 13.7 | Without the target on the array : Time | array, 1:size | array st:end:step | ifort | 12.64 | 20.59 | gfortran | 13.74 | 13.74 |
program foo implicit none integer,dimension(:),allocatable,target :: array integer :: i,val,k,tmp(1:2) integer,dimension(:),pointer :: pt_marker double precision :: t0,t1 allocate(array(20000000)) do i=1,20000000 array(i)=i end do tmp=[1,20000000] pt_marker=>array(tmp(1):tmp(2)) call cpu_time(t0) do i=1,100000 val=9*i call t(pt_marker,val,k,.false.) end do call cpu_time(t1) print *,'time pt',t1-t0 call cpu_time(t0) do i=1,100000 val=9*i call t(array,val,k,.false.) end do call cpu_time(t1) print *,'time array',t1-t0 print *,val,k deallocate(array) contains subroutine t(array,value,j,bac2) ! result(j) integer(kind=4),dimension(:) :: array logical,optional :: bac2 logical :: bac integer(kind=4) :: i,value,j,st,en,step bac=.false. if(present(bac2)) bac=bac2 if(bac) then en=1 st=size(array,1) step=-1 else st=1 en=size(array,1) step=1 end if do i=st,en,step !1,size(array,1) !do i=1,size(array,1) if(array(i) .eq. value) then j=i exit end if end do end subroutine end program
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When performing timing experiments, be sure that all timed sections have "even footing".
In your above listed program the first loop that initialized the data may have used streaming stores. This results in the array data (or last LL cache size of data) not residing in cache.
This in turn, results in the first timed loop not having the advantage of data held in cache. And the second loop having the benefit of the first loops overhead of loading cache.
To correct for this, create an additional loop around you two timed sections. Run this loop for at least two iterations, a few more is better. Discard the timed information from the first iteration.
You will want to use more than two iterations. Then after discarding the first run of each timed section, either take the average or best times (your preference).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the comment, in this case it does not change anything. The array and the loop count are big enough (i guess).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In general, you'd be better to code two separate loops, one forward (no stride) and one backward (stride -1), rather than a single loop with unknown stride.
The loop isn't directly vectorizable, since it contains an EXIT statement.
Don't understand the observation for pointers compared to arrays. I'll note, though, that the compiler doesn't know whether an assumed shape array argument will be contiguous, and must allow for that.
Fortran 2008 contains a FINDLOC intrinsic, but I'm not sure if it's implemented yet in ifort, I couldn't spot it in the documentation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The findloc is not yet implemented in either compiler I have.
I came to the same conclusion regarding producing two loops, but gfortran does not seem to be affected by the loop parameters. I don't remember that contiguous keyword change the results, and I would expect the compiler to be even more careful with a pointer than with assumed shaped array, especially which optimization is possible with a pointer and not with assumed shape arrays.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page