Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7056 Discussions

OpenMP very slow when run outside of Visual Studio

Isaac_Liu
Beginner
1,141 Views

Since we are using intel MKL library we have to load INTEL's OpenMP library (libiomp5md.dll) at run time and exclude vcomp.lib at link time. But we have to compile and link with VC++. With my release 64 bit build if I run it directly, part of my code won't fully utilize the cores I specified and it runs very slowly. It seems to be using multiple cores but might be even slower than one core. If I attach it (release build) to the visual studio debugger without doing anything else, then it fully utilize the cores I specified. Does anybody have any ideas?

We are using Visual Studio 2010 on Window 7 professional. libiomp5md.dll shows file version of 5.0.2012.803.

0 Kudos
10 Replies
Evgueni_P_Intel
Employee
1,141 Views

Hi Isaac Liu,

Does the app contain #pragma omp? If yes, does the app call MKL from OMP sections?

Thanks,

Evgueni.

0 Kudos
Isaac_Liu
Beginner
1,141 Views

Hi Evgueni,

This is a very big application. The part with issue uses OpenMP but not MKL. Other parts of this application uses MKL. My code uses a lot of OpenMP. Most of them works great and the code in trouble is actually very similar to other part.

Thanks,

Isaac

0 Kudos
Evgueni_P_Intel
Employee
1,141 Views
0 Kudos
TimP
Honored Contributor III
1,141 Views

As I read the original post, it was recognized that vcomp.lib has to be excluded so that only the single Intel OpenMP instance is active, as that will support the vcomp calls.

This raises the possibility of working with KMP_AFFINITY and number of threads so as to improve the distribution of work across cores.

If Intel(c) hyperthreading is active, MKL will use a single thread per core, but you will need to set OMP_NUM_THREADS and KMP_AFFINITY to get a similar effect from the C++ parallel regions, e.g.

KMP_AFFINITY=compact,1,1

to spread threads out 1 per core.

I don't know what effects might be produced by transitioning from 1 thread per core in MKL to something different in the C++ code.

If you have a 2 socket platform affinity will be particularly important.

0 Kudos
Andrey_C_Intel1
Employee
1,141 Views

It is hard to guess what may be happening without knowing details of the application.  Do the application creates threads for example (I mean non-OpenMP threads)? If it does then the resources oversubscription is possible. Some applications gain from setting environment variable KMP_BLOCKTIME=0, especially in case of oversubscription, when idle-spinning OpenMP worker threads slow down active OpenMP threads.

If the problem is different, then you can try to create small reproducer and submit support request.

- Andrey

0 Kudos
Isaac_Liu
Beginner
1,141 Views

After some trial and error the issue is resolved. Part of my code is called repeatedly, in the millions, and it  uses a few local std::vector of some data type of size about 100s bytes. The memory management should be very simple compared to the complexity of the computations involved. But somehow the memory management brings down the whole process.

0 Kudos
Gennady_F_Intel
Moderator
1,141 Views

well. thanks for letting us know about that cause.

0 Kudos
SergeyKostrov
Valued Contributor II
1,141 Views
>>...Part of my code is called repeatedly, in the millions, and it uses a few local std::vector of some data type of >>size about 100s bytes. The memory management should be very simple compared to the complexity of >>the computations involved. But somehow the memory management brings down the whole process... It is hard to tell you what could be exactly wrong but I would assume that there is a problem with Heap defragmentation.
0 Kudos
amr_o_1
Beginner
1,141 Views

What is the easiest way to handle heap defragmentation in c++

 

0 Kudos
SergeyKostrov
Valued Contributor II
1,141 Views
>>...What is the easiest way to handle heap defragmentation in c++... Don't use STL if you can.
0 Kudos
Reply