topic It looks like threading in Intel® Moderncode for Parallel Architectures

MultiThreading with MKL library nonlinear least square solver

Nikolay_P_1 — Wed, 11 Dec 2013 04:58:16 GMT

Hello everybody,
I am using the intel solution for Nonlinear Least Squares Problem with Linear (Bound) Constraints
http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-B6BADF1C-F90C-4D30-8B84-CF9A5F970E08.htm#GUID-B6BADF1C-F90C-4D30-8B84-CF9A5F970E08

Question: what do I need to do to run the optimizer in parallel?

A. Let me consider the intel example ex_nlsqp_bc_c.c, let's say I just call omp_set_num_threads(n) before starting the minimization loop:

omp_set_num_threads(n); //no pragmas!!! Just want to make sure I don't have to put any pragmas in the cycle.

while(not_converged)

{

dtrnlspbc_solve(OPTION); //intel mkl function minimizer;

if(OPTION-1) {my_function();} // user-supplied function

else if (OPTION-2) {djacobi(my_function);} //intel mkl function (numerical gradient); Does it call my_function from different threads?

}

In the multithreading mode what is done in parallel? Jacobian construction or just manipulations with Jacobian? I hope that calls to the user-supplied function are done with different X by multiple threads...

B. To check this I inserted omp_get_thread_num in my function

void my_function() {

i=omp_get_thread_num();

printf("%i\n",i); <-It prints different values thread numbers? Does it mean it is executed from different threads?

}

AND Thus all I need there is a thread-save function?? + set OMP_NUM_THREADS + linking correct libs?

I wish there was a better documentation on this issue.

MKL has two versions of a

jimdempseyatthecove — Tue, 21 Jan 2014 23:06:16 GMT

MKL has two versions of a library. Both versions are multi-thread safe.

One version creates its own OpenMP thread pool. This version (OpenMP multi-threaded) is intended for use with a single threaded application.

The second versions does not create its own OpenMP thread pool. This version you would typically use with an OpenMP application.

This may seem counter intuitive until you realize using the OpenMP version of MKL with an OpenMP application results in omp_num_threads() * kmp_num_threads() number of threads. Using defaults this results in the number of logical processors**2 - oversubscription.

This said, there are some cases where you might want to use both with their own OpenMP pool (two pools). But in doing so you may have to use

omp_set_num_threads(o);
kmp_set_num_threads(k);

// o*k == number of logical processors

And/or only call MKL from outside parallel regions .AND. set environment variable KMP_BLOCKTIME=0
And/or first level parallel region with greatly reduced number of threads for both pools.

Jim Dempsey

Hi,

pwschuck — Sat, 08 Feb 2014 15:11:40 GMT

Hi,

I have a similar question in regards to fortran

I'm linking with

LIBS=-L$(CFITSIO)lib64/ -lcfitsio $(MKLROOT)/lib/intel64/libmkl_lapack95_lp64.a -Wl,--start-gro
up $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_core.a $(MKLROOT)/li
b/intel64/libmkl_intel_thread.a -Wl,--end-group -lpthread -lm

I use OPENMP explictly in several regions of the code and this working properly.

(1) How do I ensure a Lapack call will use available threads? i.e.,

CALL SYEVR(COVARIANCE,EIGVAL,UPLO,Z=EIGVEC,ABSTOL=ABSTOL,INFO=INFO)

(2) Is there a way to determine the MKL version that the code is linked to at run-time?

Thanks,

-- Pete Schuck

It looks like threading

TimP — Sat, 08 Feb 2014 17:14:00 GMT

It looks like threading inside syevr depends on there being significant work done by gemv et al. at a lower level, or better, if ?latrd could be parallelized to use multiple copies of gemv. See

http://software.intel.com/en-us/articles/intel-mkl-threaded-functions

http://software.intel.com/en-us/forums/topic/292428

(where it is suggested that threading should become useful from size 128)

I don't see a clear indication about consideration of threading it at a higher level than gemv. It's probably difficult on account of the varying gemv sizes.

You would either call MKL threaded functions from outside parallel regions or use OMP_NESTED, OMP_NUM_THREADS to control how many MKL threads are in use and try to increase parallelism by calling lapack from multiple threads. There aren't well developed facilities for placing the adjacent gemv threads on a single cache, if you are trying to run multiple copies.

I suppose it would be interesting to get a report on which MKL version is active, other than by checking shared object search paths, but I don't see such a thing in the docs.