I'm trying to parallelize the following but it seems I'm not getting the speed up i'm hoping for:
!$omp parallel do private(i,j) collapse(2) do j=1, n_cp do i=1, max_p_pairs+max_a_temp if (angle2_t(i,j)<=0.0 .or. abs(angle2_t(i,j))<1.0e-6) then if (angle1_t(i,j)<0.0) then angle1_t(i,j)=angle1_t(i,j)+360 angle2_t(i,j)=angle2_t(i,j)+360 endif endif enddo enddo !$omp end parallel do
Would anyone kindly correct my code or suggest how to improve it?
Have you compared with Qopt-report:4 to see which optimizations are reported in the cases you are comparing? What is your reason for setting collapse?
(angle2_t(i,j)<1.0e-6 .and. angle1_t(i,j)<0.0)
have the same meaning and a better chance for optimization?
Are you setting appropriate affinity e.g. for the cases where hyperthreading or multiple CPUs might be in use?
Which CPU(s) and which instruction set are you using?
What was your speed-up?
How many threads were used?
Were affinities used?
If so, how?
What are the extents of each of the loops?
What is the ratio of writes verses no writes?
What is the probability of each of the terms being true of Tim's suggested if test?
The answers to these questions are pertinent for us to provide assistance.