OpenMP very slow when run outside of Visual Studio

Isaac_Liu · ‎06-06-2013

Since we are using intel MKL library we have to load INTEL's OpenMP library (libiomp5md.dll) at run time and exclude vcomp.lib at link time. But we have to compile and link with VC++. With my release 64 bit build if I run it directly, part of my code won't fully utilize the cores I specified and it runs very slowly. It seems to be using multiple cores but might be even slower than one core. If I attach it (release build) to the visual studio debugger without doing anything else, then it fully utilize the cores I specified. Does anybody have any ideas?

We are using Visual Studio 2010 on Window 7 professional. libiomp5md.dll shows file version of 5.0.2012.803.

Evgueni_P_Intel · ‎06-06-2013

Hi Isaac Liu,

Does the app contain #pragma omp? If yes, does the app call MKL from OMP sections?

Thanks,

Evgueni.

Isaac_Liu · ‎06-06-2013

Hi Evgueni,

This is a very big application. The part with issue uses OpenMP but not MKL. Other parts of this application uses MKL. My code uses a lot of OpenMP. Most of them works great and the code in trouble is actually very similar to other part.

Thanks,

Isaac

Evgueni_P_Intel · ‎06-06-2013

The following links may be of use in your case since you mix in the app two OpenMP runtime libraries.

http://software.intel.com/en-us/forums/topic/293731

http://software.intel.com/en-us/articles/how-to-use-intelr-compiler-openmp-compatibility-libraries-on-windows/

TimP · ‎06-07-2013

As I read the original post, it was recognized that vcomp.lib has to be excluded so that only the single Intel OpenMP instance is active, as that will support the vcomp calls.

This raises the possibility of working with KMP_AFFINITY and number of threads so as to improve the distribution of work across cores.

If Intel(c) hyperthreading is active, MKL will use a single thread per core, but you will need to set OMP_NUM_THREADS and KMP_AFFINITY to get a similar effect from the C++ parallel regions, e.g.

KMP_AFFINITY=compact,1,1

to spread threads out 1 per core.

I don't know what effects might be produced by transitioning from 1 thread per core in MKL to something different in the C++ code.

If you have a 2 socket platform affinity will be particularly important.

Andrey_C_Intel1 · ‎06-07-2013

It is hard to guess what may be happening without knowing details of the application. Do the application creates threads for example (I mean non-OpenMP threads)? If it does then the resources oversubscription is possible. Some applications gain from setting environment variable KMP_BLOCKTIME=0, especially in case of oversubscription, when idle-spinning OpenMP worker threads slow down active OpenMP threads.

If the problem is different, then you can try to create small reproducer and submit support request.

- Andrey

Isaac_Liu · ‎06-18-2013

After some trial and error the issue is resolved. Part of my code is called repeatedly, in the millions, and it uses a few local std::vector of some data type of size about 100s bytes. The memory management should be very simple compared to the complexity of the computations involved. But somehow the memory management brings down the whole process.

Gennady_F_Intel · ‎06-20-2013

well. thanks for letting us know about that cause.

SergeyKostrov · ‎06-22-2013

>>...Part of my code is called repeatedly, in the millions, and it uses a few local std::vector of some data type of >>size about 100s bytes. The memory management should be very simple compared to the complexity of >>the computations involved. But somehow the memory management brings down the whole process... It is hard to tell you what could be exactly wrong but I would assume that there is a problem with Heap defragmentation.

amr_o_1 · ‎05-12-2017

What is the easiest way to handle heap defragmentation in c++

SergeyKostrov · ‎05-15-2017

>>...What is the easiest way to handle heap defragmentation in c++... Don't use STL if you can.