topic Re: MKL: Significantly reduced efficiency in parallel regime in Intel® oneAPI Math Kernel Library

MKL: Significantly reduced efficiency in parallel regime

DmitrySkachkov — Sun, 29 Dec 2024 04:25:11 GMT

Hello,

I am using Intel-oneapi-mkl/2024 to calculate inverse double complex matrix using

getrf/getri programs.

On one core, the calculation time is 0.10 s

However in parallel regime using OpenMP parallelization, the calculation time is significantly increased:

OMP=2 calc.time = 0.16s

OMP=32 calc.time = 10.1s

The size of the complex(8) matrix is 484x484

The program was compiled in hybrid MPI/OpenMP regime and used for one MPI process.

Re: MKL: Significantly reduced efficiency in parallel regime

Ruqiu_C_Intel — Mon, 30 Dec 2024 01:36:25 GMT

Thank you for posting your question.

Please provide out a simple reproducer, as well as tell us your test environment(OS, CPU platforms, oneAPI version, etc.), and the steps to reproduce your issue. Also it's good if you test again on the latest oneAPI version(2025.0.1)

Re: MKL: Significantly reduced efficiency in parallel regime

DmitrySkachkov — Mon, 30 Dec 2024 21:23:43 GMT

The following code makes inverse of complex matrix using MKL getri program. The version of Intel compiler is 2024.0.2 20231213

Calculation time

OMP_NUM_THREADS=1 22.29s

OMP_NUM_THREADS=2 74.69s

OMP_NUM_THREADS=4 314.15s

The program was compiled with the following options:

ifx -o test.x program.o -qmkl-ilp64=parallel /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_blas95_ilp64.a /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_lapack95_ilp64.a -Wl, --start-group /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_intel_ilp64.a /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_intel_thread.a /opt/shared/intel-oneapi/2024.2.0.634/mkl/2024.2/lib/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm -ldl

Module MKL95 include '/opt/shared/intel-oneapi/2024.0.1.46/mkl/2024.0/include/mkl_lapack.fi' include '/opt/shared/intel-oneapi/2024.0.1.46/mkl/2024.0/include/mkl_blas.fi' end module MKL95 MODULE F95_PRECISION INTEGER, PARAMETER :: SP = KIND(1.0E0) INTEGER, PARAMETER :: DP = KIND(1.0D0) END MODULE F95_PRECISION INCLUDE 'mkl_lapack.f90' INCLUDE 'mkl_blas.f90' module data integer, parameter :: N = 400 complex(8) :: AA(N,N) integer :: ipvt(N) contains subroutine set_A integer :: i,j real :: r1(N) real :: r2(N) do i=1,N call random_number(r1) call random_number(r2) do j=1,N AA(i,j)%re = r1(i) AA(i,j)%im = r2(j) enddo enddo do i=1,N AA(i,i) = (1.d0,0.1d0) enddo do i=1,N do j=i+1,N AA(i,j) = AA(j,i) enddo enddo end subroutine set_A end module data Program A use F95_PRECISION use MKL95 use LAPACK95 use BLAS95 use data use omp_lib integer :: info real(8) :: t0,tf call cpu_time(t0) do i=1,1000 call set_A call getrf(AA,ipvt,info) if(info/=0) print *,' getrf: info=',info call getri(AA,ipvt,info) if(info/=0) print *,' getri: info=',info enddo call cpu_time(tf) print *,' t_calc=',tf-t0 end program A

Re:MKL: Significantly reduced efficiency in parallel regime

Aleksandra_K — Wed, 08 Jan 2025 12:54:12 GMT

Hi,

There is an issue with the way the timing is being measured. When using CPU_TIME, the time returned is the sum of all active threads' CPU times, not the actual wall time.

To estimate the performance or scaling of multithreaded applications, you should use the intrinsic subroutine SYSTEM_CLOCK or the portability function DCLOCK. Both of these routines return the elapsed time from a single clock.

Also, please consider upgrading to oneAPI version 2025.0, as I won't be able to provide advice for earlier versions.

Regards,

Aleksandra

Re:MKL: Significantly reduced efficiency in parallel regime

Aleksandra_K — Wed, 15 Jan 2025 12:30:15 GMT

Hi,

Did you have a chance to try any of the suggested methods for measuring wall time?

Re:MKL: Significantly reduced efficiency in parallel regime

Aleksandra_K — Mon, 20 Jan 2025 12:20:45 GMT

I am closing this issue due to lack of response and will no longer monitor this thread. If you need further assistance, please start a new thread.