- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there an inherent penalty associated with using allocatable arrays? I have 2 examples below. The one with allocatable arrays in a module takes 33% more CPU time. If I change the allocatable to pointer in the array declarations, the CPU time jumps another 68%
Example 1: Static arrays accessed through common blocks.
Example 2: Allocatable arrays accessed through module
Example 1: Static arrays accessed through common blocks.
program main parameter (na = 1000000) common /A_MOD/ dt,A(na),dAdt(na) real t(2) dt = 0.001 do n=1,1000 call SUB enddo t2 = DTIME(t) print *, t stop end subroutine SUB parameter (na = 1000000) common /A_MOD/ dt,A(na),dAdt(na) do n=1,na A(n) = A(n) + dAdt(n)*dt enddo return end
Example 2: Allocatable arrays accessed through module
program main use A_MOD real t(2) na = 1000000 ALLOCATE(A(na)) ALLOCATE(dAdt(na)) do n=1,1000 call SUB enddo t2 = DTIME(t) print *, t stop end subroutine SUB use A_MOD do n=1,na A(n) = A(n) + dAdt(n)*dt enddo return end module A_MOD integer :: na real :: dt = 0.001 real, allocatable, dimension(:) :: A, dAdt end module A_MOD
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apples and oranges. You're comparing compile-time known array bounds with those that have to be computed and fetched at run-time, not to mention accessed through a pointer as compared to link-time static addressing in the "static" case.
Steve
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I want to use deferred-shape arrays, is there anything I can do (compiler settings) to keep the run time down? I have run the same example on several UNIX platforms and find the CPU time is only slightly worse (~2%) or sometimes better when the allocatable array is used.
Does the computation of array bounds and access of memory take that much time compared to the arithmetic operations?
Does the computation of array bounds and access of memory take that much time compared to the arithmetic operations?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me suggest you try something else first. Compile your programs with "maximum optimizations". See what times you get.
Steve
Steve
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Increasing the optimization level from 4 to 5 gets me about 15% better CPU time when using the allocatable arrays. The improvement whit the explicit arrays is 85%, though.
I modified by example a little, so the subroutine is exactly the same and the only difference in the module is that the arrays are either explicit or allocatable.
subroutine SUB
use A_MOD
do n=1,na
A(n) = A(n) + dAdt(n)*dt
enddo
return
end
I have combined a portion of the assembly listing below, with spaces added to align things. It looks to me like everything inside the loop is the same and the difference is in the addressing, which looks like it is done one per call. Shouldn't this mean that as I increase the size of the array (and therefore the number of times through the loop) that the difference in CPU time should decrease? It doesn't. Doubling the size doubles the CPU time for both cases.
I modified by example a little, so the subroutine is exactly the same and the only difference in the module is that the arrays are either explicit or allocatable.
subroutine SUB
use A_MOD
do n=1,na
A(n) = A(n) + dAdt(n)*dt
enddo
return
end
I have combined a portion of the assembly listing below, with spaces added to align things. It looks to me like everything inside the loop is the same and the difference is in the addressing, which looks like it is done one per call. Shouldn't this mean that as I increase the size of the array (and therefore the number of times through the loop) that the difference in CPU time should decrease? It doesn't. Doubling the size doubles the CPU time for both cases.
PUBLIC _SUB@0 PUBLIC _SUB@0 _SUB@0 PROC _SUB@0 PROC sub esp, 8 sub esp, 8 ; 30 use A_MOD ; 31 ; 32 do n=1,na mov eax, 1000000 mov eax, 1000000 lea edx, dword ptr A_MOD_mp_DADTps_$ lea ecx, dword ptr A_MOD_mp_Aps_$ push ebx push ebx ; 33 A(n) = A(n) + dAdt(n)*dt mov ecx, dword ptr .data$+36 fld dword ptr .data$ fld dword ptr .data$ mov edx, dword ptr .data$+8 mov ebx, dword ptr .data$+68 fstp st(1) fstp st(1) shl ecx, 2 sub edx, ecx mov ecx, dword ptr .data$+40 add edx, 4 shl ebx, 2 sub ecx, ebx lea ecx, dword ptr 4[ecx] add eax, 0 mov eax, eax mov eax, eax nop lab$0044: lab$0040: fld st(0) fld st(0) fmul dword ptr [edx] fmul dword ptr [edx] fadd dword ptr [ecx] fadd dword ptr [ecx] fstp dword ptr [ecx] fstp dword ptr [ecx] fld st(0) fld st(0) fmul dword ptr 4[edx] fmul dword ptr 4[edx] fadd dword ptr 4[ecx] fadd dword ptr 4[ecx] fstp dword ptr 4[ecx] fstp dword ptr 4[ecx] fld st(0) fld st(0) fmul dword ptr 8[edx] fmul dword ptr 8[edx] fadd dword ptr 8[ecx] fadd dword ptr 8[ecx] fstp dword ptr 8[ecx] fstp dword ptr 8[ecx] fld st(0) fld st(0) fmul dword ptr 12[edx] fmul dword ptr 12[edx] fadd dword ptr 12[ecx] fadd dword ptr 12[ecx] fstp dword ptr 12[ecx] fstp dword ptr 12[ecx] fld st(0) fld st(0) fmul dword ptr 16[edx] fmul dword ptr 16[edx] fadd dword ptr 16[ecx] fadd dword ptr 16[ecx] fstp dword ptr 16[ecx] fstp dword ptr 16[ecx] fld st(0) fld st(0) fmul dword ptr 20[edx] fmul dword ptr 20[edx] fadd dword ptr 20[ecx] fadd dword ptr 20[ecx] fstp dword ptr 20[ecx] fstp dword ptr 20[ecx] fld st(0) fld st(0) fmul dword ptr 24[edx] fmul dword ptr 24[edx] fadd dword ptr 24[ecx] fadd dword ptr 24[ecx] fstp dword ptr 24[ecx] fstp dword ptr 24[ecx] prefetch qword ptr 284[edx] prefetch qword ptr 284[edx] fld st(0) fld st(0) fmul dword ptr 28[edx] fmul dword ptr 28[edx] prefetchw qword ptr 284[ecx] prefetc hw qword ptr 284[ecx] ; 34 enddo add edx, 32 add edx, 32 fadd dword ptr 28[ecx] fadd dword ptr 28[ecx] fstp dword ptr 28[ecx] fstp dword ptr 28[ecx] add ecx, 32 add ecx, 32 sub eax, 8 sub eax, 8 cmp eax, 0 cmp eax, 0 jg lab$0044 jg lab$0040 ; 35 ; 35 ; 36 return ; 36 return ; 37 end ; 37 end ffree st(0) ffree st(0) pop ebx pop ebx add esp, 8 add esp, 8 ret ret _SUB@0 ENDP _SUB@0 ENDP END END
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Doubling the array size will double the number of memory accesses, which may be a significant factor.
Steve
Steve

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page