Fast way to clear a multidimensional array

andyb123 · ‎10-28-2011

I've have some code that does similar to this:

real, allocatable, dimension(:, :) :: a

allocate(a(10, 10000))
a = 0.0

The sizes vary a bit, but the first dimension is always quite small. I know this isn't ideal, but it is from a large legacy code. This makes the array assignment quite slow since it seems to resolve down to 10000 calls to intel_fast_memset, one for each set of ten elements. What I want really is to clear the whole thing with just one memset call. I am trying this currently instead:

call fastclear(a, size(a))
...

subroutine fastclear(a, n)
integer, intent(in):: n
real, dimension(n), intent(inout) :: a
a = 0.0
end subroutine

This seems to do the job fine, but I'm concerned if there might bepadding put into the array. So, I wonder if anyone with a good understanding of fortran arrays knows if the compiler is allowed to pad between array rows or whether my solution is properly valid standard fortran?

As an enhancement, if the compiler could spot whole allocatable array operations and treat them as 1 dimensional that would be good too.

Thanks,
Andy.

mecej4 · ‎10-28-2011

You focused on the call overhead, and ignored the cache access overhead. Consider just-in-time clearing; that is, something along the lines of

do j=1,10000
a(:,j) = 0.0
do i=1,10
a(i,j) = a(i,j) + ...
end do
end do

If you can arrange your calculation in this fashion, the array section that is set in the 'action' line, Line-4 above, will access memory that is already in the cache and has been set to zero.

TimP · ‎10-28-2011

If the array had the sequence or target attribute, this would be valid. In principle, it would also be a stronger suggestion to the compiler to make the replacement automatic.
As the other response hinted, your fastclear method would result in fast_memset evicting as much as possible of the array from cache, leaving as much as possible of the original cache content in place, while the sequence of 10000 short memset calls would fill the cache with the zeroed array elements.