- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi~,
I have a question about ifort option "heap-arrays" in intel 64 mode(64bit).
I compiled a program which requires large computation in IA-32 mode(32bit) without "heap-arrays" option, and the computation time is about 3 seconds.
In intel 64 mode(64bit), I compiled the same program with "heap-arrays" option, but in this case, the computation time is about 100 seconds.
Could any one give me the reason of it and how I could get the same performance of IA-32 mode in intel 64 mode?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think we'd need to see an example.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for that.
First of all, I can see the difference in 32-bit. The problem appears to be inside the memory allocator - the pattern of allocations is causing it to spend a lot of time working with its free lists. The bulk of the time is taken up at the entry to NESTED_DPOL_2D where the automatic arrays B and C are declared.
This is the first program I have seen where /heap-arrays makes such a big difference. We'll investigate this some more. Did you have a need to use /heap-arrays? You could turn it on for some sources and not all if need be.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's just malloc and free taking all that time - I was distracted by the additional debug library stuff that malloc/free does. The routines taking most of the time are small and don't do much work, so the allocate/free swamps the actual work. NESTEDMUL_DPOL is another one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh, and I saw about an 8X change from 3 seconds to 24. I could never get it to 100 seconds. Be sure you're not building with debug libraries, which makes it worse.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
It's just malloc and free taking all that time - I was distracted by the additional debug library stuff that malloc/free does. The routines taking most of the time are small and don't do much work, so the allocate/free swamps the actual work. NESTEDMUL_DPOL is another one.
Dear Steve,
Thank you very much for your explanations.
The computation and allocations of variables in heap memory take much time than variable in stack memory?
Actually, the main program that uses the routines I listed above needs lots of memory, so it needs to be compiled with "heap-arrays".
Another questions:
Which memory region are the assumed shaped arrays allocated? Stack or heap?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Additionally, I found a difference in computational speed when variables are declared differently.
Let me show examples:
------------------------------------------------------------------------------------------------
case. 1
PROGRAM TEST
IMPLICIT NONE
real(4) :: time_begin, time_end
integer(4), parameter :: nn = 1000
real(8) :: aa(nn,nn), bb(nn,nn), cc(nn,nn)
aa = 1.0_8
bb = 1.0_8
call cpu_time(time_begin)
call foo(nn,aa,bb,cc)
call cpu_time(time_end)
print *, time_end - time_begin
contains
subroutine foo(n, a, b, c)
integer(4), intent(in) :: n
real(8), intent(in) :: a(:,:), b(:,:)
real(8), intent(out) :: c(:,:)
integer(4) :: i, j, k
do i = 1, n
do j = 1, n
c(i,j) = 0.0_8
do k = 1, n
c(i,j) = c(i,j) + a(i,k)*b(k,j)
end do
end do
end do
end subroutine foo
end program
==========================================
case. 2
PROGRAM TEST
IMPLICIT NONE
real(4) :: time_begin, time_end
integer(4), parameter :: nn = 1000
real(8) :: aa(nn,nn), bb(nn,nn), cc(nn,nn)
aa = 1.0_8
bb = 1.0_8
call cpu_time(time_begin)
call foo(nn,aa,bb,cc)
call cpu_time(time_end)
print *, time_end - time_begin
contains
subroutine foo(n, a, b, c)
integer(4), intent(in) :: n
real(8), intent(in) :: a(n,n), b(n,n)
real(8), intent(out) :: c(n,n)
integer(4) :: i, j, k
do i = 1, n
do j = 1, n
c(i,j) = 0.0_8
do k = 1, n
c(i,j) = c(i,j) + a(i,k)*b(k,j)
end do
end do
end do
end subroutine foo
end program
------------------------------------------------------------
The two cases are compiled in IA32 and without the option 'heap-arrays'.
The second case is much faster.
This means it is better to declare variables as automatic array than as assumed shape arrays. Is it true?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assumed-shape arrays don't imply any particular allocation. If they also have the ALLOCATABLE attribute then they are always heap allocated. If POINTER, they're heap-allocated if ALLOCATE is used, otherwise they're whatever the target was when pointer assignment was done.
The computation aspect when using /heap-arrays isn't the issue - there is no difference. But there is a cost to heap allocation and deallocation, whereas stack allocation is a single subtract instruction.
Your two examples in the last post are something else entirely - the allocation is done in the main program and the arrays are all dummy arguments, not automatic arrays. The only difference is where the bounds are passed. In the second example, the compiler has more information about the bounds than it does in the first, and this can improve optimization. Most tests I have seen don't show significant differences here, though. When constructing such tests, make sure that the optimizer hasn't thrown away computational work because it sees the results were never used, which is exactly what happened here. When I add a use of C after the timing, I get identical times for the two programs.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page