- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
========================================================
program test
use omp_lib
implicit none
integer i, j, k, n, np, npm
real(8) t1
real, allocatable :: a(:,:,:), b(:,:,:), dt(:)
n = 500
allocate( a(n,n,n), b(n,n,n) )
a = 1.0
b = 2.0
npm = omp_get_max_threads()
allocate( dt(npm) )
do np=1,npm
call omp_set_num_threads(np)
t1 = omp_get_wtime()
call sum_mat( n, a, b )
dt(np) = omp_get_wtime() - t1
enddo
do np=2,npm
print *, np, dt(1)/dt(np)
enddo
end
subroutine sum_mat( n, a, b )
implicit none
integer n
real a(n,n,n), b(n,n,n)
integer i, j, k
!$omp parallel do
do k=1,n
do j=1,n
do i=1,n
a(i,j,k) = a(i,j,k) + b(i,j,k)
enddo
enddo
enddo
end
========================================================
I use IFC 10 and make exe file by the next command:
ifort /nologo /O2 /Qfpp2 /Qopenmp s5.f90
When I run this program several times on my Dual Core processor, I get the next results:
2 0.9937300
2 0.9861304
2 1.004650
Why I cannot get speed up about 2? What is incorrect in the program?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You also may be experiencing a cache loading problem. Try running the same test inside the same application
do i=1,5
print *, 'Iteration ',i
do np=1,npm
call omp_set_num_threads(np)
t1 = omp_get_wtime()
call sum_mat( n, a, b )
dt(np) = omp_get_wtime() - t1
enddo
do np=2,npm
print *, np, dt(1)/dt(np)
enddo
end do
The 1st iteration generally is longer as it populates the cache.
If one of the other test runs is longer than the others this may indicate the operating system interfering with the program.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
JimDempseyAtTheCove:
You also may be experiencing a cache loading problem. Try running the same test inside the same application
The 1st iteration generally is longer as it populates the cache.
If one of the other test runs is longer than the others this may indicate the operating system interfering with the program.
Thank you for advice, but I get the same results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>In order to see a warm cache effect, where you get parallel speedup on the 2nd repetition, the big arrays would have to be made small enough to fit in cache.
That is correct (array sizes were not listed in users original post).
On the flip side, if you know you have larger than cache size arrays then break up the problem into chunks that fit within cache, then you can (potentially) work your way through the large array(s) and improve cache utilization. There are some system calls to get the sizes of L1, L2 (and L3 if present) cache. This technique has various names one of which is called striping.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Array sizes were listed in the original post: 500**3 = 125 million 4-byte reals * 2 arrays is a solid gigabyte. He's reading half a gig twice and writing it once in almost exactly 1 second, presumably on a farily recent Intel single-socket desktop machine. Hence, he's within a factor of 2 of the memory controller's maximum sustained bandwidth.
But he ought to be able to get closer to theoretical than that for this operation, even with one core ...
Oh, yeah: he's also allocating and initializing the arrays in that second. If the OS is zero-filling the memory that's two more GB transferred and we're at 3.5 GB/sec. You can't do better than that on his hardware no matter how many threads you use.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page