I have a rather subtle use case where I'd appreciate some advice.
I have an application with a number of math libraries as shown in the illustration attached:
- The main application depends on two main math libraries that I've written myself, one in Fortran and one in C++.
- The Fortran library is highly parallelised using OpenMP, and often calls MKL functions from within threads; therefore it should ideally use the MKL with a sequential threading layer, to avoid the overheads associated with nested parallelism.
- Parts of the C++ library are also highly parallelised using Open MP, but no MKL functions are ever called from within threads. However, it incorporates OpenCV as a static library, and many of the functions it provides use OpenCV. All the functions that use OpenCV are sequential - care has been taken never to mix OpenCV and OpenMP. Even so, OpenCV itself makes extensive use of the MKL and so needs to be linked to it, and since the OpenCV functions are sequential it would be preferable to use the threaded implementation of MKL.
So my question is this: is it possible to have threaded and non-threaded versions of MKL loaded in this way at the same time? Or am I asking for trouble?
If I'm asking for trouble, would linking MKL statically to each library be an acceptable workaround? (I know this would mean a bigger footprint on disk and possibly more memory usage, but I don't expect that to be a problem)
Or will I have to settle for using a sequential version of the MKL for OpenCV as well (which is probably preferable to risking the threaded version of the MKL for the Fortran application).
What you could do is make a compromise.
Run the OpenMP with fewer than full complement of threads. .AND.
Run parallel MKL with fewer than full complement of threads.
You may have to experiment to find the best mix.
Also, when the OpenCV portion runs at different time intervals than OpenMP/MKL, consider setting the KMP_BLOCKTIME to 0 or some small number. Doing so may reduce the latency of thread wakeup between the different thread pools at a sacrifice of increased(induced) wakups between parallel regions within each thread pool.
Can't argue with that!
The only thing is, I'm some way off building and running the code - I'm still trying to sort out the architecture (though the Fortran and C++ libraries already exist and have been tested separately).
The answer does seem to have come through loud and clear that I can't run dynamic instances of the MKL with different threading layers side-by-side. I'm still tempted to try the static linking option.
Thank you Jim; that does sound a bit of a compromise though; if I really must compromise somewhere, I'd rather put up with the single-threaded MKL for OpenCV, since getting maximum performance from the Fortran library is really critical.