- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have talked to a few F90 experts but have not found a good answer to my problem.. Why is the code below run twice as slow when the arrays are dynamically allocated vs. statically? I tried a few compiler options (-fno-alias, -static) and tried
using VECTOR NONTEMPORAL but nothing helps.
In this case on a 2.8GHz Nehalem (not shared), the performance difference is 2x.
Is using allocatable arrays such as this (typical use in numerical models) just slower? If so, why?
I compiled the code with ifort -O3 -xSSE4.2 -fno-alias -fpp relax.f -o relax, and added -DDYNAMIC to enable
allocatable memory. The compiler version is 11.1.038.
Thanks,
Craig
program relax
integer :: mbyte=262144
integer :: nb=100
#if defined(DYNAMIC)
real,allocatable :: a(:,:),b(:,:)
allocate(a(0:mbyte+1,nb))
allocate(b(0:mbyte+1,nb))
#else
real :: a(0,262145),b(0,262145)
#endif
a=100
b=100
do iter=1,100
!$OMP parallel do
do n=1,nb
do k=1,mbyte
b(k,n)=(a(k+1,n)+a(k-1,n))/2.
end do
end do
!$OMP parallel do
do n=1,nb
do k=1,mbyte
a(k,n)=b(k,n)
end do
end do
call sub(a)
print *,iter
end do
stop
end
subroutine sub()
return
end
using VECTOR NONTEMPORAL but nothing helps.
In this case on a 2.8GHz Nehalem (not shared), the performance difference is 2x.
Is using allocatable arrays such as this (typical use in numerical models) just slower? If so, why?
I compiled the code with ifort -O3 -xSSE4.2 -fno-alias -fpp relax.f -o relax, and added -DDYNAMIC to enable
allocatable memory. The compiler version is 11.1.038.
Thanks,
Craig
program relax
integer :: mbyte=262144
integer :: nb=100
#if defined(DYNAMIC)
real,allocatable :: a(:,:),b(:,:)
allocate(a(0:mbyte+1,nb))
allocate(b(0:mbyte+1,nb))
#else
real :: a(0,262145),b(0,262145)
#endif
a=100
b=100
do iter=1,100
!$OMP parallel do
do n=1,nb
do k=1,mbyte
b(k,n)=(a(k+1,n)+a(k-1,n))/2.
end do
end do
!$OMP parallel do
do n=1,nb
do k=1,mbyte
a(k,n)=b(k,n)
end do
end do
call sub(a)
print *,iter
end do
stop
end
subroutine sub()
return
end
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is the program correct? The !$OMP PARALLEL is opened but not closed... and I don't undertstand the static array declaration, shouldn't it read:
real :: a(0:262145,100),b(0:262145,100) ?
real :: a(0:262145,100),b(0:262145,100) ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why the static version and dynamic allocated version arrays have different data size and shape?
The allocatable version has shape of (262145,100). The static version has shape of (0,262145) and its data size is 0. Obviously there will be out-of-bound error for the static version in following loops. If compiled with runtime check enabled by "-C" option there will be a runtime error:
forrtl: severe (408): fort: (2): Subscript #1 of the array A has value 2 which is greater than the upper bound of 0
The allocatable version has shape of (262145,100). The static version has shape of (0,262145) and its data size is 0. Obviously there will be out-of-bound error for the static version in following loops. If compiled with runtime check enabled by "-C" option there will be a runtime error:
forrtl: severe (408): fort: (2): Subscript #1 of the array A has value 2 which is greater than the upper bound of 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ahhh, I wrote bad code. It was a sanity check that went bad. Yes the code had bugs. After fixing them,
the two cases run pretty much the same.
Sorry for the post.
Craig
the two cases run pretty much the same.
Sorry for the post.
Craig
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page