did you build the .lib with

Kandel__Mikhail · ‎12-05-2013

I have a highly separable, embarrassingly parallel image processing code.

When I have a standalone program I see a nice speedup when I enable automatic parallelization.

Unfortunately, when I build my critical region as a .lib file, and link it to my existing code consistently uses only 1 core. I verified by looking at process monitor, which in my case shows only 25% utilization.

How does automatic parallelization work when linking static libraries?

TimP · ‎12-05-2013

The .lib would be built and the application would need to be built and linked with /Qparallel or /Qopenmp. It won't work like MKL or IPP where simply linking the .lib would cause it to go parallel, in case that is your question. The current restriction against using a static libiomp5 should protect you against problems caused by linking libiomp5 into the .lib build.

If you call your auto-parallelized .lib code from inside a parallel region, the .lib code won't add additional threads unless you set OMP_NESTED.

Maybe I'm mis-reading between the lines of your question. A build log might help.

JenniferJ · ‎12-05-2013

did you build the .lib with /Qipo? also try build the program with /Qipo.

And you could try with the Intel Cilk Plus's cilk-for as well. Here we have several simple image processing examples.

Jennifer

Kandel__Mikhail · ‎12-05-2013

TimP (Intel) wrote:

The .lib would be built and the application would need to .be built and linked with /Qparallel or /Qopenmp.

Does the target application need to be linked with /Qparallel? In my case I am linking my static .lib to a solution build with VS 2012, so this would be impossible?

TimP (Intel) wrote:

A build log might help.

The static library

/GS /W3 /QxHost /Gy /Zc:wchar_t /Zi /Ox /Fd"x64\Release\vc110.pdb" /fp:fast /Quse-intel-optimized-headers /D "WIN32" /D "NDEBUG" /D "_LIB" /D "_UNICODE" /D "UNICODE" /Qstd=c++11 /Qipo /Zc:forScope /arch:AVX /Oi /MD /QaxCORE-AVX2 /Fa"x64\Release\" /EHsc /nologo /Qparallel /Fo"x64\Release\" /Qstd=c99 /Fp"x64\Release\optomizedcpu.pch"

The main application

/Yu"stdafx.h" /GS /GL /W3 /Gy /Zc:wchar_t- /I".\GeneratedFiles" /I"." /I"C:\Qt\Qt5.1.0\5.1.0\msvc2012_64_opengl\include" /I".\GeneratedFiles\Release" /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" /Zi /Gm- /Ox /Fd"x64\Release\vc110.pdb" /fp:fast /D "WIN32" /D "QT_DLL" /D "QT_NO_DEBUG" /D "NDEBUG" /D "_WIN32_WINNT=0x0601" /D "_AFXDLL" /errorReport:prompt /GT /WX- /Zc:forScope /arch:AVX /Gd /Oi /MD /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Ot /Fp"x64\Release\Slim2.pch"

Sorry for the grammar errors, I can't figure out how to edit posts.

Kandel__Mikhail · ‎12-13-2013

Anybody have any ideas?

Olga_M_Intel · ‎12-17-2013

kandel3 wrote:

I verified by looking at process monitor, which in my case shows only 25% utilization.

With auto-parallelization you can use /Qpar-report{0|1|2|3} to see what part of your code was parallelized.

Also, since the Intel compiler does auto-parallelization using OpenMP you can set KMP_VERSION=1 to see if OpenMP library was actually used.

TimP · ‎12-18-2013

kandel3 wrote:

Quote:

TimP (Intel) wrote:
The .lib would be built and the application would need to .be built and linked with /Qparallel or /Qopenmp.

Does the target application need to be linked with /Qparallel? In my case I am linking my static .lib to a solution build with VS 2012, so this would be impossible?

Quote:

TimP (Intel) wrote:
A build log might help.

The static library

/GS /W3 /QxHost /Gy /Zc:wchar_t /Zi /Ox /Fd"x64\Release\vc110.pdb" /fp:fast /Quse-intel-optimized-headers /D "WIN32" /D "NDEBUG" /D "_LIB" /D "_UNICODE" /D "UNICODE" /Qstd=c++11 /Qipo /Zc:forScope /arch:AVX /Oi /MD /QaxCORE-AVX2 /Fa"x64\Release\" /EHsc /nologo /Qparallel /Fo"x64\Release\" /Qstd=c99 /Fp"x64\Release\optomizedcpu.pch"

The main application

/Yu"stdafx.h" /GS /GL /W3 /Gy /Zc:wchar_t- /I".\GeneratedFiles" /I"." /I"C:\Qt\Qt5.1.0\5.1.0\msvc2012_64_opengl\include" /I".\GeneratedFiles\Release" /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include" /Zi /Gm- /Ox /Fd"x64\Release\vc110.pdb" /fp:fast /D "WIN32" /D "QT_DLL" /D "QT_NO_DEBUG" /D "NDEBUG" /D "_WIN32_WINNT=0x0601" /D "_AFXDLL" /errorReport:prompt /GT /WX- /Zc:forScope /arch:AVX /Gd /Oi /MD /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Ot /Fp"x64\Release\Slim2.pch"

I'll let someone else figure out what you mean by a static library built with /MD.

It gets confusing when you specify more than one /Qx or /Qax option. Are you meaning that you want the library built for AVX and AVX2 and no others? In that case the QxHost appears to conflict, but ought to be over-ridden.

intel-optimized-headers means that you intend to substitute IPP call version of <string> headers. I suppose that could be taken care of automatically when linking on Windows, as long as the ICL link paths are set.

If you don't use ICL /Qparallel at the link step, you do need somehow to take care that it doesn't try to use the Microsoft vcomp library and instead uses libiomp5.dll.

I suppose it may be better to start with something simple and work up to the fancier options.

I have no idea about compatibility of nvidia toolkit with some of the fancier options./

Automatic parallelization lost in static library?