integer, parameter :: n=200, nt=20,ni=64
real(8) :: a(n,n),c(ni),d
integer :: i,j
!$omp parallel num_threads(2),shared(c,ti)
!$omp do schedule(static),private(a,i,j,t1,t0)
!$omp end do
!$omp end parallel
ta=sum(ti)/nt ! average run time
sd=sqrt(sd) ! standard deviation
print *,"ave=",ta," sdev=",sd," max=",maxval(ti)," min=",minval(ti)
end program Cpu
There's astorage conflicton array c inside the parallel region. Each thread has a private copy of the loop index j, but the threads can still access the same elements of c. This could be causing cache conflicts.
I suggest analyzing this code with the Intel Threading Tools. Thread Checker will help you find race conditions. Thread Profiler will you tune threaded performance.
Message Edited by hagabb on 12-22-2005 07:28 AM
Message Edited by hagabb on 12-22-2005 07:30 AM
Message Edited by hagabb on 12-22-2005 07:31 AM
Message Edited by izryu on 12-22-2005 06:37 PM
Try placing the results of the MATMUL into a private (to thread)and declared storage area. I think what is happening is the compiler is allocating a common static temporary array to hold the results of the MATMUL and this temporary array is being used by both threads (resulting in cache problems).