- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it normal to observe such a big difference in execution time when the allocatable arrays are used vs fixed.
an example code produces
ifort pexcample.f90 -O3 ; time ./a.out
(a, fixed size) Time = 0.000004 seconds.
real 0m0.041s
user 0m0.002s
sys 0m0.004s
(b, allocatable array) Time = 14.428799 seconds.
real 0m19.532s
user 0m4.383s
sys 0m12.056s
program pexample implicit none integer(kind=8), parameter :: n = 1000, m = 1024, p = 1024, np = n*m*p ! real(kind=8) :: x(3,n,m,p) real(kind=8), allocatable :: x(:,:,:,:) real(kind=8) :: tic, toc integer(kind=8) :: i, j, k, l, id allocate( x(1:3,1:n,1:m,1:p) ) call cpu_time(tic) do k = 1, p do j = 1, m do i = 1, n ! id = I + N*(J-1) + N*M*(K-1) x(1,i,j,k) = 0.d0!(i-1)*1.d0/(n-1) x(2,i,j,k) = 0.d0!(j-1)*1.d0/(m-1) x(3,i,j,k) = 0.d0!(k-1)*1.d0/(p-1) end do end do end do call cpu_time(toc) deallocate( x ) print '("Time = ",f10.6," seconds.")', toc-tic end program pexample
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Scenario static defined array:
The data of the empty array is loaded into memory at program load time prior to cpu_time. IOW the CPU time is measured after the array addresses are loaded in RAM.
Scenario allocated array:
In the specific code above, the allocate(x(...)) obtains a node from the heap, however the memory representing this node has never been touched (excepting for possibly the page where the header of the node resides). The cpu_time is taken prior to first touch. The subsequently, as you walk onto (first touch) pages of the address (not yet used first), this causes a page fault. The O/S then obtains a page (address) from the page file (assuming one is available), maps it to the virtual address of the process (pexample), possibly wipes the page (or reads page, then possibly wipes), then returns to your code to continue the loop until the next page of the array is touched. This repeats until loop finishes.
Your loop above is designed to measure array access time. Therefore the appropriate action would be to insert
x = 0.0
between the allocate and the call to cpu_time.
Before you do that, as a learning experience, add another integer variable iRep, then construct a DO iRep=1,3 loop from before the allocate to after the print. I also suggest adding a print 'array located at', LOC(x) after the allocate.
If the same memory space gets reallocated, then the 2nd and later runs will be fast. If not they will be slow up until the heap allocations cycle back to prior first-touched memory locations. The behavior of this is dependent on the CRTL (C Run Time Library used by Fortran)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Jim for the detailed explanation.
Indeed the same memory space gets reallocated but no improvement
array located at 4475424768
Time = 14.375916 seconds.
array located at 4475424768
Time = 14.443573 seconds.
array located at 4475424768
Time = 14.414449 seconds.
real 0m56.574s
user 0m13.171s
sys 0m36.430s
when x is initialised after allocation (x = 0.0)
array located at 4401082368
Time = 10.560929 seconds.
array located at 4401082368
Time = 10.038265 seconds.
array located at 4401082368
Time = 10.134284 seconds.
real 1m38.054s
user 0m25.221s
sys 0m55.270s
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
when I use static array, LOC(X) causes elapsed time to increase at each step of the loop. I did not experience it before on Linux. could it be related to OS X? I am using Xcode 5.1 which is not supported yet.
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.2.139 Build 20140121
Copyright (C) 1985-2014 Intel Corporation. All rights reserved
do iRep = 1, 3 print*,'array located at', LOC(x) x = 0 call cpu_time(tic) do k = 1, p do j = 1, m do i = 1, n x(1,i,j,k) = 0.d0!(i-1)*1.d0/(n-1) x(2,i,j,k) = 0.d0!(j-1)*1.d0/(m-1) x(3,i,j,k) = 0.d0!(k-1)*1.d0/(p-1) end do end do end do call cpu_time(toc) print '("Time = ",f10.6," seconds.")', toc-tic end do
array located at 4320210208
Time = 9.153378 seconds.
array located at 4320210208
Time = 18.616963 seconds.
array located at 4320210208
Time = 23.337855 seconds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may also want to compile the code with the vdc-report and opt-report turned on. -O3 is a relatively aggressive optimization flag, and it is possible that the compiler is finding optimizations for one case and not for the other. At the very least it might provide some insight into what the compiler is doing behind the scenes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try using omp_get_wtime() for timing.
I cannot run this program. It requires .gt. 25GB, I only have 16GB
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't really know how to report this issue since it seems like the problem is generated randomly. I've tested the same code on different Macs (same OS, compiler, but different RAM).
When I use omp_get_wtime the code compiles randomly. If compiled illegal instruction: 4 is issued runtime independent of array size.
if not compiled: error #6930: The size of the array dimension is too large,
Yet another ridicules observation is that the timing for loop is the same regardless of the array dimensions (this is also randomly occurring).
Time = 0.00000400 seconds for (n = 600, m = 1024, p = 1024)
Time = 0.00000400 seconds for (n = 600, m = 124, p = 124)
program pexample #ifdef WTIME USE omp_lib #endif implicit none integer(kind=4), parameter :: n = 50, m = 1024, p = 1024, np = n*m*p #ifdef ALOC real(kind=8), allocatable :: x(:,:,:,:) #else real(kind=8) :: x(3,n,m,p) #endif real(kind=8) :: tic, toc integer(kind=4) :: i, j, k, l, iRep do iRep = 1,3 #ifdef ALOC allocate( x(1:3,1:n,1:m,1:p) ) #endif ! print '("Array located at",I,f)', LOC(x), sizeof(x)*9.3132e-10 #ifdef WTIME tic = omp_get_wtime() #else call cpu_time(tic) #endif x = 0.0 do k = 1, p do j = 1, m do i = 1, n x(1,i,j,k) = 0.d0!(i-1)*1.d0/(n-1) x(2,i,j,k) = 0.d0!(j-1)*1.d0/(m-1) x(3,i,j,k) = 0.d0!(k-1)*1.d0/(p-1) end do end do end do #ifdef WTIME toc = omp_get_wtime() #else call cpu_time(toc) #endif #ifdef ALOC deallocate( x ) #endif print '("Time = ",f15.8," seconds.")', toc-tic end do end program pexample ! Results !ifort -fpp pexample.f90 -o pexample !Time = 0.00000400 seconds. !Time = 0.00000100 seconds. !Time = 0.00000100 seconds !ifort -fpp pexample.f90 -DALOC -o pexampleALOC !Time = 10.44109600 seconds. !Time = 10.46074400 seconds. !Time = 10.52917000 seconds. !ifort -fpp pexample.f90 -DWTIME -openmp -o pexample_WTIME ! random result ! 1. Illegal instruction: 4 ! 2. error #6930: The size of the array dimension is too large, and overflow occurred when computing the array size.! real(kind=8) :: x(3,n,m,p) !ifort -fpp pexample.f90 -DALOC -DWTIME -openmp -o pexampleALOC_WTIME !Time = 10.51393390 seconds. !Time = 10.62356901 seconds. !Time = 10.63541198 seconds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I seem to recall an allocation issue where when the size of the allocation is over 2GB/4GB and the index(s) used in the allocation are integer(4)
Try changing allocate( x(1:3,1:n,1:m,1:p) ) to
allocate( x(1_8:3_8,1_8:INT8(n),1_8:INT8(m),1_8:INT8(p)) )
or allocate( x(:3_8, INT8(n), INT8(m), INT8(p)) ).
You may also need to experiment with changing the (or one of the) loop control variables to integer(8).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually, I've been playing with the integer kind, but without any success. I can not even run a simple code
program pexample USE omp_lib use, intrinsic :: ISO_FORTRAN_ENV, only : RP => REAL64, IP => INT64 implicit none integer(kind=IP), parameter :: ub = 3_IP integer(kind=IP), parameter :: lb = 1_IP integer(kind=IP), parameter :: n = 2_IP integer(kind=IP), parameter :: m = 1024_IP integer(kind=IP), parameter :: p = 1024_IP integer(kind=IP), parameter :: np = n*m*p real(kind=RP) :: x(lb:ub,lb:n,lb:m,lb:p) real(kind=RP) :: tic, toc tic = omp_get_wtime(); x = 0.0_RP; toc = omp_get_wtime(); end program pexample ! Results ! x(1:3,1:2,1:1024,1:1024) ! ifort -openmp pexample1.f90 ; ./a.out ! Segmentation fault: 11 ! x(1:4,1:2,1:1024,1:1024) ! Illegal instruction: 4
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Enable heap arrays.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page