Calling LAPACK/MKL from parallel OpenMP region

Pelle_R_ · ‎07-14-2015

Dear All,

I often call some BLAS and LAPACK (MKL) routines from my Fortran programs. Typically, I try to place these calls outside of any parallel OpenMP regions while then making use of the parallelism inside the routines.

In a new part of the code, however, I need to call DGESV from a "omp parallel" region (see dummy code below). The below code will crash as all threads call GESV individually. Putting the GESV call into a "omp single" section works, but limits the performance, as it is running in one thread only.

I am aware that I could "easily" end the parallel region, call LAPACK/MKL and start a new parallel region again. Still, this doesn't feel "right" to me, particularly as I pass this part of the code many, many times as I am solving a physical problem iteratively.

Chances are, that there is a smarter way of doing the OpenMP part anyways, but a very similar code using BLAS's GEMV instead of LAPACK's GESV works beautifully and scales perfectly. (Note that in that case I call GEMV from a "omp single" region.)

Any comments are appreciated, also those that suggest other ways of structuring the OpenMP part of below code, or those that give some more general hints on how to call BLAS/LAPACK/MKL from OpenMP programs properly and in general.

Best,

Pelle

Simplified code:

!$omp parallel
timestepping: do while (t < tmax .and. .not. error)
     !$omp do
     do i=1, imax
          ! do stuff, prepare system of equations
     end do
     !$omp end do
     
     ! Need to improve here
     call gesv(lhs, rhs, info=ierror)
     
     ! Other operations, check for convergence/errors etc
     !$omp barrier
end do timestepping
! Do some other stuff in parallel
!$omp end parallel

Ying_H_Intel · ‎07-19-2015

Hi,

I recall a similar discussion for your reference

https://software.intel.com/en-us/forums/topic/550681

You can control the MKL gesv thread in OpenMP parallel region by adding

export KMP_AFFINITY="verbose,compact"
export OMP_ACTIVE_LEVELS="2"
export OMP_NESTED="true"
export MKL_DYNAMIC="false"

call mkl_set_num_threads( nt)

call gesv(lhs, rhs, info=ierror).

Please let us know the result if any experiments

Best Regards,

Ying

TimP · ‎07-20-2015

In the usual sense, time stepping would be inherently sequential and would be outside parallel regions. Of course, we.don't know if that is a red herring, but it also looks like you are setting up for each thread to evaluate termination condition in a racy manner.