Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7220 Discussions

MKL: Significantly reduced efficiency in parallel regime

DmitrySkachkov
Beginner
827 Views

Hello,

 

I am using Intel-oneapi-mkl/2024 to calculate inverse double complex matrix using

getrf/getri programs.  

On one core, the calculation time is 0.10 s

However in parallel regime using OpenMP parallelization, the calculation time is significantly increased:

OMP=2 calc.time = 0.16s

OMP=32 calc.time = 10.1s

 

The size of the complex(8) matrix is 484x484

The program was compiled in hybrid MPI/OpenMP regime and used for one MPI process.

 

 

 

 

0 Kudos
5 Replies
Ruqiu_C_Intel
Moderator
765 Views

Thank you for posting your question.

Please provide out a simple reproducer, as well as tell us your test environment(OS, CPU platforms, oneAPI version, etc.), and the steps to reproduce your issue. Also it's good if you test again on the latest oneAPI version(2025.0.1)

0 Kudos
DmitrySkachkov
Beginner
747 Views

The following code makes inverse of complex matrix using MKL getri program. The version of Intel compiler is 2024.0.2 20231213

Calculation time 

OMP_NUM_THREADS=1   22.29s

OMP_NUM_THREADS=2   74.69s

OMP_NUM_THREADS=4   314.15s

The program was compiled with the following options:

 

 

 

ifx -o test.x program.o -qmkl-ilp64=parallel  /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_blas95_ilp64.a /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_lapack95_ilp64.a -Wl,
--start-group /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_intel_ilp64.a /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_intel_thread.a /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm -ldl
  Module MKL95
   include '/opt/shared/intel-oneapi/2024.0.1.46/mkl/2024.0/include/mkl_lapack.fi'
   include '/opt/shared/intel-oneapi/2024.0.1.46/mkl/2024.0/include/mkl_blas.fi'
  end module MKL95


MODULE F95_PRECISION
    INTEGER, PARAMETER :: SP = KIND(1.0E0)
    INTEGER, PARAMETER :: DP = KIND(1.0D0)
END MODULE F95_PRECISION


INCLUDE 'mkl_lapack.f90'
INCLUDE 'mkl_blas.f90'


 module data
   integer, parameter         :: N = 400
   complex(8)                 :: AA(N,N)
   integer                    :: ipvt(N)
 contains
  subroutine set_A
   integer     :: i,j
   real        :: r1(N)
   real        :: r2(N)
   do i=1,N
    call random_number(r1) 
    call random_number(r2) 
    do j=1,N
     AA(i,j)%re = r1(i)
     AA(i,j)%im = r2(j)
    enddo 
   enddo    
   do i=1,N
    AA(i,i) = (1.d0,0.1d0)
   enddo
   do i=1,N
    do j=i+1,N
     AA(i,j) = AA(j,i)
    enddo
   enddo
  end subroutine set_A
 end module data



  Program A
   use F95_PRECISION
   use MKL95
   use LAPACK95
   use BLAS95
   use data
   use omp_lib
   integer                    :: info   
   real(8)                    :: t0,tf

   call cpu_time(t0)
   do i=1,1000
    call set_A
    call getrf(AA,ipvt,info)
    if(info/=0) print *,' getrf: info=',info
    call getri(AA,ipvt,info)
    if(info/=0) print *,' getri: info=',info
   enddo
   call cpu_time(tf)
   print *,'  t_calc=',tf-t0
   
  end program A

 

 

 

 

0 Kudos
Aleksandra_K
Moderator
610 Views

Hi,


There is an issue with the way the timing is being measured. When using CPU_TIME, the time returned is the sum of all active threads' CPU times, not the actual wall time.


To estimate the performance or scaling of multithreaded applications, you should use the intrinsic subroutine SYSTEM_CLOCK or the portability function DCLOCK. Both of these routines return the elapsed time from a single clock.


Also, please consider upgrading to oneAPI version 2025.0, as I won't be able to provide advice for earlier versions.


Regards, 

Aleksandra


0 Kudos
Aleksandra_K
Moderator
540 Views

Hi,

Did you have a chance to try any of the suggested methods for measuring wall time?



0 Kudos
Aleksandra_K
Moderator
484 Views

I am closing this issue due to lack of response and will no longer monitor this thread. If you need further assistance, please start a new thread. 


0 Kudos
Reply