I am trying to use openmp with DCT transform to speedup the performance. The program works fine when I set omp_num_threads=1, when I set it to >1, I can see the CPU load is doubled but the result is wrong. Below is the code snippet. Could anyone help me out?
! prepare for DCTs call d_init_trig_transform(nx-1,MKL_COSINE_TRANSFORM,ipar,dpar,ir) call d_commit_trig_transform(alpha(:,1),handle,ipar,dpar,ir) ! forward transform !$OMP PARALLEL DO do i = 1,ny,1 call d_forward_trig_transform(alpha(:,i),handle,ipar,dpar,ir) end do !$OMP END PARALLEL DO ! SOME PROCESSING IN FREQUENCY DOMAIN ! inverse transform !$OMP PARALLEL DO do i = 1,ny,1 call d_backward_trig_transform(alpha(:,i),handle,ipar,dpar,ir) end do !$OMP END PARALLEL DO ! clean up transform call free_trig_transform(handle,ipar,ir)
After reading the document, it seems like I have to set ipar(9) = nthread (I also tried ipar(10) = nthread considering I am using Fortran). But the problem stays the same.
Very interestingly, the same program works fine on a i5-3210m cpu but have the aforementioned problem on a W3680 and a E5-2680 cpu.
We had the similar problem in the past but we don't see such problem with the latest versions. What MKL version you are using? Could you get us the comprehensive test to check the problem on our side?
I have MKL version 11.0. The Intel software manager in Windows tells me I have the latest version of MKL. The code I have is a pretty complex 3D simulation code. Let me stripping down the unnecessary parts and send the code back to you.
I would recommend you evaluate one of the latest updated 11.1 ( update 4) or the latest version 11.2 to check if the problem still exists on your side. in the case if you will see the problem with these latest versions, then, please give us the standalone test for further investigating.