Solved: can not use 100% CPU usage if use OMP and LAPACK together

nvh10 · ‎07-15-2021

Hello everyone,

I am doing inverse of a matrix and see that if my program uses both OMP and LAPACK library, I can not use 100% CPU usage, just 10%. I also find out that OMP and MKL library can not use 100% CPU usage if they are used together.

I attached all options I chose. Is there any mistake I made?

Thank you!


    program testinv
    implicit none
    integer i,j,N
    double precision, allocatable,dimension(:,:):: A,invA1
    N=100000
    allocate(A(N,N),invA1(N,N))
    
    !$omp parallel do
    do j=1,N
        do i=1,N
            A(i,j)=1d0
        enddo
    enddo
    !$omp end parallel do

    call InverseMatrixD(N, A, invA1)
    
    contains
    subroutine InverseMatrixD(N, A, invA)
    implicit none
    integer N, IPIV(N), INFO
    double precision A(N,N), invA(N,N), WORK(N)

    invA(:,:) = A(:,:)
    call DGETRF (N, N, invA(:,:), N, IPIV(:), INFO)
    call DGETRI (N, invA(:,:), N, IPIV(:), WORK(:), N, INFO)

    end subroutine InverseMatrixD
    end program testinv

nvh10 · ‎07-15-2021

I found my bad. This one is nessessary. Thanks for visiting my stupid question

View solution in original post

nvh10 · ‎07-15-2021

I found my bad. This one is nessessary. Thanks for visiting my stupid question

jimdempseyatthecove · ‎07-16-2021

There are some extenuating issues for your to be aware of....

MKL has two libraries: serial/sequential and threaded/parallel

MKL threaded/parallel internally uses OpenMP for parallelization.

Both MKL libraries are thread-safe (both can be used from threaded and non-threaded applications).

Note, this is counterintuitive to: Link threaded library with threaded program or sequential library with sequential program.

When a preponderantly threaded application call MKL within parallel regions, then the better choice of MKL libraries to use is the sequential MKL library. The reason being, should MKL threaded library be called from within a parallel region (or actually different thread), MKL (threaded) library will instantiate a unique (different) OpenMP thread pool for use by the calling thread(s). For example, a system capable of 16 hardware threads this could result in each of the 16 application threads call into MKL threaded library instantiating 16 different thread pool, each of 16 threads (256 threads) iow grossly over subscription.

If you have a parallel application... but only call MKL from the master thread, what you can do is link in the MKL threaded library AND set the environment variable KMP_BLOCKTIME=0 (or some small value you determine is best). With this setting, there will still be two thread pools but the spin-wait times at the ends of the parallel regions (your app and MKL) is 0, meaning at the end(s) of the parallel region(s) the non-instantiating thread(s) immediately suspends (making that hardware thread available for the other domain's parallel region(s) or other process on the system).

There are other times when you might want to specifically tune the number of threads as used by the main application and by each caller into the MKL threaded library (this gets complicated).

Jim Dempsey

MK