Community
cancel
Showing results for 
Search instead for 
Did you mean: 
izryu
Beginner
50 Views

Open MP slows down performance

I am just starting to learn Open MP, and encountered the following problem.
The next program (which does nothing useful) runs on dual core Pentium D about 6 times slower with /Qopenmp than without it.
program Cpu
implicit none
integer, parameter :: n=200, nt=20,ni=64
real(8) :: a(n,n),c(ni),d
integer :: i,j
real t0,t1,ti(nt),ta,sd

!$omp parallel num_threads(2),shared(c,ti)
!$omp do schedule(static),private(a,i,j,t1,t0)
do i=1,nt
call cpu_time(t0)
do j=1,ni
call random_number(a)
c(j)=sum(matmul(a,a))
end do
call cpu_time(t1)
ti(i)=t1-t0
end do
!$omp end do
!$omp end parallel
ta=sum(ti)/nt ! average run time
sd=dot_product(ti-ta,ti-ta)/(nt-1)
sd=sqrt(sd) ! standard deviation
print *,"ave=",ta," sdev=",sd," max=",maxval(ti)," min=",minval(ti)
end program Cpu
Both processors are fully loaded, and yet instead of expected 2 times speedup I get a substantial slowdown. Looks like I am missing some crucial point, but what?
Yuri.
0 Kudos
3 Replies
Henry_G_Intel
Employee
50 Views

Hello Yuri,

There's astorage conflicton array c inside the parallel region. Each thread has a private copy of the loop index j, but the threads can still access the same elements of c. This could be causing cache conflicts.

I suggest analyzing this code with the Intel Threading Tools. Thread Checker will help you find race conditions. Thread Profiler will you tune threaded performance.

Best regards,

Henry

Message Edited by hagabb on 12-22-2005 07:28 AM

Message Edited by hagabb on 12-22-2005 07:30 AM

Message Edited by hagabb on 12-22-2005 07:31 AM

izryu
Beginner
50 Views

Hello hagabb,
Thanks for your suggestion. I have moved "c" from shared to private clause, the values of this array are thrown away in any case. Yet it didn't help,with /Qopenmp this program runs for 60 seconds, while withoutit only 17 seconds. I am totally baffled.
Yuri.

Message Edited by izryu on 12-22-2005 06:37 PM

jim_dempsey
Beginner
50 Views

Try placing the results of the MATMUL into a private (to thread)and declared storage area. I think what is happening is the compiler is allocating a common static temporary array to hold the results of the MATMUL and this temporary array is being used by both threads (resulting in cache problems).

Jim Dempsey

Reply