openMP - Page 2

antfu · ‎09-26-2010

Dear all,
I am trying to use OPENMP parallel construct. The running time, however, is the same as a code without parallel. I then checked the number of threads used in the parallel construct. I find that only the master thread is used. I have a 6-core machine, shouldn't the number of threads be 6? I tried the command call OMP_SET_NUM_THREADS(6) , but it does not do anything.

Is there anything I should change in order to use openmp? If so, would you mind telling me how in Microsoft Visual Studio?

Thanks a lot.

antfu · ‎09-29-2010

Thank you for your advice. Before i do what you suggested, I did a very very simple test,
!$OMP PARALLEL private(i)
!$OMP do

do i=1,100000000
temp=matmul(temp2,temp1)
enddo

!$OMP ENDDO
!$OMP end PARALLEL

The result is shocking: without parallel, it takes 0.14 minutes. With paralleling, it takes 2.9 minutes.

Steven_L_Intel1 · ‎09-29-2010

The compiler realized that this program did nothing and eliminated most of it. With parallel, it eliminated less.

You've made a very common error in writing performance tests - coding it such that the compiler can remove large chunks of it since the output is never used.

antfu · ‎10-02-2010

Dear Jim and all others,

I tried the test you suggested as follows:

integer:: sanity(12) ! I have 12 threads
!$OMP parallel
sanity=0
!$omp ATOMIC
sanity(i)=sanity(i)+1
IF(sanity(i)/=1) print*,'wrong',i
!$omp end parallel

I got an error message: " #1 of the array of SANITY is -1, lower than the lower bound 1"

Can you tell me what is wrong here?

Also, I assume that if I parallel the following
!$omp parallel do private(i,j)
iloop: do i=1,N
jloop: do j=1,Z
...
enddo jloop
enddo iloop
!$omp end parallel do
Then each thread should take some work from the iloop and for each i, the responsible thread do the entire jloop serially? That is, what other threads do in jloop should not matter for the current thread's work in jloop?
Am I wrong?

Thank you so much for your advice.

TimP · ‎10-02-2010

I believe the idea was to add such a test in a parallel loop in your own code, with sanity initialized outside, e.g.

sanity=0
!$OMP parallel
!$omp do
do i=1,size(sanity)
!$omp ATOMIC
sanity(i)=sanity(i)+1
IF(sanity(i)/=1) print*,'wrong',i
end do
!$omp end parallel

Apparently, you never initialized i, while you asked each thread to zero the entire sanity array.

For your case of nested loops, yes, you want each inner loop to be independent of the others, so that threads don't interfere with each other. In Fortran openmp, the do counters (j in your example) are automatically private, so that each thread has its own copy. Explicit private (firstprivate/lastprivate) declarations are needed for other variables/arrays which aren't shared.

antfu · ‎10-02-2010

Dear all,

Thank you all for your helpful advice. I just found out why my parallel program is "slower" than the serial program. I used call cpu_time, instead of OMP_get_wtime(). The former somehow calculated the timing wrong!

Now my simple program runs faster with parallel, I will try the more complicated ones.

Thanks.

TimP · ‎10-02-2010

cpu_time is probably designed to total up the times used by all threads. It would be exceptional if this were to decrease with parallelization. omp_get_wtime() should give elapsed time, as would system_time(), except that the latter doesn't have as good an implementation on windows x64.

antfu · ‎10-02-2010

Now, I am running my whole program, but I got this message:
"insufficient virtual memory". My machine has 24GB, 12 threads. I am not sure whether this is really about memory. Thanks for hints.