- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've have some code that does similar to this:
real, allocatable, dimension(:, :) :: a
allocate(a(10, 10000))
a = 0.0
The sizes vary a bit, but the first dimension is always quite small. I know this isn't ideal, but it is from a large legacy code. This makes the array assignment quite slow since it seems to resolve down to 10000 calls to intel_fast_memset, one for each set of ten elements. What I want really is to clear the whole thing with just one memset call. I am trying this currently instead:
call fastclear(a, size(a))
...
subroutine fastclear(a, n)
integer, intent(in):: n
real, dimension(n), intent(inout) :: a
a = 0.0
end subroutine
This seems to do the job fine, but I'm concerned if there might bepadding put into the array. So, I wonder if anyone with a good understanding of fortran arrays knows if the compiler is allowed to pad between array rows or whether my solution is properly valid standard fortran?
As an enhancement, if the compiler could spot whole allocatable array operations and treat them as 1 dimensional that would be good too.
Thanks,
Andy.
real, allocatable, dimension(:, :) :: a
allocate(a(10, 10000))
a = 0.0
The sizes vary a bit, but the first dimension is always quite small. I know this isn't ideal, but it is from a large legacy code. This makes the array assignment quite slow since it seems to resolve down to 10000 calls to intel_fast_memset, one for each set of ten elements. What I want really is to clear the whole thing with just one memset call. I am trying this currently instead:
call fastclear(a, size(a))
...
subroutine fastclear(a, n)
integer, intent(in):: n
real, dimension(n), intent(inout) :: a
a = 0.0
end subroutine
This seems to do the job fine, but I'm concerned if there might bepadding put into the array. So, I wonder if anyone with a good understanding of fortran arrays knows if the compiler is allowed to pad between array rows or whether my solution is properly valid standard fortran?
As an enhancement, if the compiler could spot whole allocatable array operations and treat them as 1 dimensional that would be good too.
Thanks,
Andy.
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You focused on the call overhead, and ignored the cache access overhead. Consider just-in-time clearing; that is, something along the lines of
do j=1,10000
a(:,j) = 0.0
do i=1,10
a(i,j) = a(i,j) + ...
end do
end do
If you can arrange your calculation in this fashion, the array section that is set in the 'action' line, Line-4 above, will access memory that is already in the cache and has been set to zero.
do j=1,10000
a(:,j) = 0.0
do i=1,10
a(i,j) = a(i,j) + ...
end do
end do
If you can arrange your calculation in this fashion, the array section that is set in the 'action' line, Line-4 above, will access memory that is already in the cache and has been set to zero.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the array had the sequence or target attribute, this would be valid. In principle, it would also be a stronger suggestion to the compiler to make the replacement automatic.
As the other response hinted, your fastclear method would result in fast_memset evicting as much as possible of the array from cache, leaving as much as possible of the original cache content in place, while the sequence of 10000 short memset calls would fill the cache with the zeroed array elements.
As the other response hinted, your fastclear method would result in fast_memset evicting as much as possible of the array from cache, leaving as much as possible of the original cache content in place, while the sequence of 10000 short memset calls would fill the cache with the zeroed array elements.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page