Intel MKL 11.0, static link to mkl_core.lib, more than 300 MB?

Zhanghong_T_ · ‎10-25-2012

Dear all,

I have a chance to test the latest MKL 11.0 in Vindows 7 64bit + VS2010 + Intel Composer XE 2013, I need to call the PARDISO, to my supprise, the link libraies setting in my project can't be found, I have to link the mkl_core.lib, after several miniutes waiting, I got the static linked library, it reaches to more than 300 MB, So terrible! Why not let the program only link the needed objects? or I don't have a correct project setting?

In addition, it seems that with the latest MKL 11.0 + Intel Composer XE 2013, the generated executable file is much slower than MKL 10.x + Intel Composer XE 2012, what's wrong?

Thanks,

Zhanghong Tang

mecej4 · ‎10-26-2012

Your post contains reports of many problems, but little information that can be used to troubleshoot the problems. How did you compile? What compiler options were in effect? Do you have a build log that you can post here? For instance, you wrote, "or I don't have a correct project setting?" How is it possible to answer that question without knowing what project settings you used, either explicitly or by inheritance from configuration files? Secondly, when you wrote "the generated executable file is much slower than MKL 10.x + Intel Composer XE 2012, ", what was the source code that you used to reach this conclusion?

Zhanghong_T_ · ‎10-26-2012

Hi, Thank you very much for your kindly reply. The build log is as follows: 1>------ Build started: Project: solver, Configuration: Release x64 ------ 2>------ Build started: Project: randgen, Configuration: Release x64 ------ 1>Compiling with Intel(R) Visual Fortran Compiler XE 13.0.0.089 [Intel(R) 64]... 1>solver.f90 2>Build started 2012/10/26 23:51:42. 2>InitializeBuildStatus: 2> Creating "x64\Release\randgen.unsuccessfulbuild" because "AlwaysCreate" was specified. 2>ClCompile: 2> randomgen.c 2>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\Microsoft.CppBuild.targets(1151,5): warning MSB8012: TargetPath(D:\solver\x64\Release\randgen.lib) does not match the Library's OutputFile property value (D:\solver\lib64\randgen.lib). This may cause your project to build incorrectly. To correct this, please make sure that $(OutDir), $(TargetName) and $(TargetExt) property values match the value specified in %(Lib.OutputFile). 2>Lib: 2> randgen.vcxproj -> D:\solver\x64\Release\randgen.lib 2>FinalizeBuildStatus: 2> Deleting file "x64\Release\randgen.unsuccessfulbuild". 2> Touching "x64\Release\randgen.lastbuildstate". 2> 2>Build succeeded. 2> 2>Time Elapsed 00:00:01.69 1>Creating library... 1>randgen.lib(WELL1024a.obj) : MSIL .netmodule or module compiled with /GL found; restarting link with /LTCG; add /LTCG to the link command line to improve linker performance 1> 1>Build log written to "file://D:\solver\x64\Release\BuildLog.htm" 1>solver - 0 error(s), 0 warning(s) ========== Build: 2 succeeded, 0 failed, 0 up-to-date, 0 skipped ========== The project settings are: /nologo /O2 /fpp /I"C:\Program Files (x86)\Intel\Composer XE 2013\mkl\include" /DWIN64 /Qsave /module:"x64\Release/" /object:"x64\Release/" /Fd"x64\Release\vc100.pdb" /libs:static /threads /c /OUT:"solver.lib" /LIBPATH:".\lib64" /LIBPATH:"C:\Program Files (x86)\Intel\Composer XE 2013\mkl\lib\intel64" /NOLOGO randgen.lib mkl_intel_lp64.lib mkl_sequential.lib mkl_core.lib The source code is just calling PARDISO to solve a large sparse matrix which comes from FEM. The feeling not real timing data shows that the 11.0 is slower:) Thanks, Zhanghong Tang

mecej4 · ‎10-26-2012

If you can provide the source codes (preferably a small example that displays the problem), it would make it easier to track down the problems.

Zhanghong_T_ · ‎10-26-2012

Hi mecej4, Thank you very much for your kindly reply. When build the attached project in Windows 7 64bit + VS2010 + MKL 11.0 + Intel Composer XE 2013, we can get the static linked library with size 300 MB. Thanks, Zhanghong Tang

mecej4 · ‎10-27-2012

Thanks for posting the Zip file with the project files. Your project seeks to create a static library from the MKL example pardiso_sym_f.f, with the PROGRAM statement replaced by SUBROUTINE and with no arguments. I do not understand why you want to do this: (i) what do you propose to do with the library? (ii) why do you want a static library? Note that a self-contained static library containing the runtime libraries for Fortran and MKL will be huge -- essentially the sum of the sizes of all the Intel-provided libraries provided for a particular architecture. For reference: I created a DLL from your source file using the command ifort /Qmkl /fast /LD pardiso_sym_f.f and the resulting DLL sizes were: Ifort 12.1.5.344, Build 20120612, X64: 14,848 bytes Ifort 13.0.1.119, Build 20121008, X64: 15,872 bytes

Zhanghong_T_ · ‎10-27-2012

Hi mecej4, Thank you very much for your kindly reply. I just put this example to show that the problem can be reproduced. I noticed that when use 10.x, the size of lib file is only about 8 MB. I am used to use static linked library instead of dynamic linked library since the program linked by former doesn't affected by other changes. Thanks, Zhanghong Tang

Zhanghong_T_ · ‎11-14-2012

Dear all, I noticed that udpate 1 of 11.0 is ready. Can this updated version solve the large size problem introduced by static linking? Thanks, Zhanghong Tang

Gennady_F_Intel · ‎11-14-2012

No, the size of the static library would be the same as you have seen with 11.0 version.in this case when you want to create static lib from mkl’s static lib – this is the expected results.

Zhanghong_T_ · ‎11-14-2012

Hi Gennady, Thank you very much for your kindly reply. So you mean that this case will keep since 11.0? The size of the static library is only about 8 MB if Iink the code with 10.x version. Thanks, Zhanghong Tang

junziyang · ‎01-15-2013

I also found that MKL 11.0 with ivf2013is much slower than MKL 10.x with ivf2011.

I used it to compiler FORTRAN MEX files on win8 with MATLAB 2012b x64 with the following option setting:

set COMPILER=ifort
set COMPFLAGS=/fpp /Qprec /I"%MATLAB%/extern/include" /c /nologo /free /fp:source /MT /assume:bscc
set OPTIMFLAGS=/O3 /DNDEBUG /QxHost /Qvec-report /Qftz
set DEBUGFLAGS=/Z7
set NAME_OBJECT=/Fo

In the MEX file, only subroution MATMUL is used.

For mkl 10.x with ivf2011, the excution time is ~348 s while for mkl 11.0 with ivf2013, the excution time is ~518 s.

junziyang · ‎01-15-2013

Besides, my laptop has a i3-M330 CPU with 8Gbit memory.

Both visions are tested on the same laptop and compiled with exactaly the same options.

I've also tested the 32bit version, it's slower than the 64bit version, and ivf2013 still much slower than ivf2011.

Gennady_F_Intel · ‎01-15-2013

junziyang wrote:

I also found that MKL 11.0 with ivf2013is much slower than MKL 10.x with ivf2011.

I used it to compiler FORTRAN MEX files on win8 with MATLAB 2012b x64 with the following option setting:

How we can check it on our side? can we get us the C/C++ or F77/F90 examples to check?

what is the problem size? routines? CPU type?

junziyang · ‎01-15-2013

Sorry. It's a FORTRAN 90 MEX file in a MATLAB project. So it's imposible to run and test it seperately.

The matrix size is about 200x100x1000000. A matrix of 200x100 is used in the calculation and then store the results into sequential pages of a 200x100x1000000 matrix and then store it to the harddisk.

The SAME program is compiled with the SAME options with different version of compiler and tested with the SAME parameters.

Most of the computation time is involved with MATMUL.

Gennady_F_Intel · ‎01-15-2013

Ok, is that double precision or complex double or another one?

and i see you used Intel Core i3 Processor. Did you link with threaded or sequentional libs: mkl_sequential.lib mkl_intel_thread.lib ?

junziyang · ‎01-17-2013

Yes. I put all the *.lib on the LINKFLAGS path.

It's double precison.

Sosunova_M_ · ‎01-21-2013

I’ve tried s-,d-, and cgemm performance on Core i7 machine with RHEL Server 6.3 64-bit.

Sizes used in calculations: 200x100x1000000

Didn’t see performance degradation from 10.3 to 11.0. Got the following results

sgemm:

2 threads, 10.3: 0,371785 sec

2 threads, 11.0: 0,384604 sec

4 threads, 10.3: 0,198118 sec

4 threads, 11.0: 0,205439 sec

dgemm:

2 threads, 10.3: 0,747748 sec

2 threads, 11.0: 0,751148 sec

4 threads, 10.3: 0,395847 sec

4 threads, 11.0: 0,400287 sec

cgemm:

2 threads, 10.3: 1,56011 sec

2 threads, 11.0: 1,55974 sec

4 threads, 10.3: 0,885265 sec

4 threads, 11.0: 0,88249 sec