Re: Mixing threading layers in MKL

Morris__Stephen · ‎09-10-2020

I have a rather subtle use case where I'd appreciate some advice.

I have an application with a number of math libraries as shown in the illustration attached:

The main application depends on two main math libraries that I've written myself, one in Fortran and one in C++.
The Fortran library is highly parallelised using OpenMP, and often calls MKL functions from within threads; therefore it should ideally use the MKL with a sequential threading layer, to avoid the overheads associated with nested parallelism.
Parts of the C++ library are also highly parallelised using Open MP, but no MKL functions are ever called from within threads. However, it incorporates OpenCV as a static library, and many of the functions it provides use OpenCV. All the functions that use OpenCV are sequential - care has been taken never to mix OpenCV and OpenMP. Even so, OpenCV itself makes extensive use of the MKL and so needs to be linked to it, and since the OpenCV functions are sequential it would be preferable to use the threaded implementation of MKL.

So my question is this: is it possible to have threaded and non-threaded versions of MKL loaded in this way at the same time? Or am I asking for trouble?

If I'm asking for trouble, would linking MKL statically to each library be an acceptable workaround? (I know this would mean a bigger footprint on disk and possibly more memory usage, but I don't expect that to be a problem)

Or will I have to settle for using a sequential version of the MKL for OpenCV as well (which is probably preferable to risking the threaded version of the MKL for the Fortran application).

jimdempseyatthecove · ‎09-11-2020

What you could do is make a compromise.

Run the OpenMP with fewer than full complement of threads. .AND.
Run parallel MKL with fewer than full complement of threads.

You may have to experiment to find the best mix.

Also, when the OpenCV portion runs at different time intervals than OpenMP/MKL, consider setting the KMP_BLOCKTIME to 0 or some small number. Doing so may reduce the latency of thread wakeup between the different thread pools at a sacrifice of increased(induced) wakups between parallel regions within each thread pool.

Jim Dempsey

Ron_Green · ‎09-11-2020

MKL has some great API calls and ENV vars to control threading. You should check out this information HERE

Morris__Stephen · ‎09-11-2020

I assume that if I used system calls or environment variables then there'd be some latency, and in reality I'll want to move back and forth between OpenCV and OpenMP quite frequently so I'd rather avoid this.

jimdempseyatthecove · ‎09-11-2020

Try setting the environment variable KMP_BLOCKTIME=0, and run the multi-thread KML

IOW run a test and see what happens.

Jim Dempsey

Morris__Stephen · ‎09-11-2020

Can't argue with that!

The only thing is, I'm some way off building and running the code - I'm still trying to sort out the architecture (though the Fortran and C++ libraries already exist and have been tested separately).

The answer does seem to have come through loud and clear that I can't run dynamic instances of the MKL with different threading layers side-by-side. I'm still tempted to try the static linking option.

Morris__Stephen · ‎09-11-2020

Thank you Jim; that does sound a bit of a compromise though; if I really must compromise somewhere, I'd rather put up with the single-threaded MKL for OpenCV, since getting maximum performance from the Fortran library is really critical.