Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Pardiso on WIN64 using only one thread

mullervki
Beginner
1,448 Views
Hello,

I have the exact same code running on Linux64 and Win64. Everything works well in Linux64. But in Win64, even though I set OMP_NUM_THREADS and MKL_NUM_THREADS to 2, Pardiso reports

< Parallel Direct Factorization with #processors: > 1

And this happens both with in-core as well as out-of-core. I'm using version 10.3, build 20110314.

Do I need to do anything else other than set the above 2 environment variables?

Thanks.
0 Kudos
28 Replies
Gennady_F_Intel
Moderator
271 Views
please see the Linker Adviserhere.
0 Kudos
mullervki
Beginner
271 Views
Gennady,

I now have the BLAS half-running properly.

I have 2 ways of building my code: (1) inside Visual Studio 2009 and (2) from the command line.

When I build everything from the command line everything works fine. However, when I run within Visual Studio in DEBUG mode I simply get wrong answers. The BLAS routines in my code (not using Pardiso now... just testing BLAS for the moment) are generating incorrect results when run from within VS.

These are my compiler flags inside VS:

/Od
/I "C:\Program Files (x86)\Intel\ComposerXE-2011\mkl\include"
/D "WIN32"
/D "_DEBUG"
/D "_LIB"
/FD
/EHsc
/RTC1
/MD
/openmp
/Fo"x64\Debug\" /Fd"x64\Debug\vc90.pdb"
/W3 /nologo /c /Wp64 /ZI /errorReport:prompt

And these are the relevant link flags, and, at the very end, the Intel libraries I'm using.

/INCREMENTAL:NO
/NOLOGO
/LIBPATH:"C:\Program Files (x86)\Intel\ComposerXE-2011\mkl\lib\intel64"
/LIBPATH:"C:\Program Files (x86)\Intel\ComposerXE-2011\compiler\lib\intel64"
/MANIFEST
/MANIFESTFILE:"x64\Debug\exam.exe.intermediate.manifest"
/MANIFESTUAC:"level='asInvoker' uiAccess='false'"
/DEBUG
/SUBSYSTEM:CONSOLE
/LARGEADDRESSAWARE
/DYNAMICBASE:NO
/FIXED:No
/MACHINE:X64
/ERRORREPORT:PROMPT
mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib

Any thoughts on what may be causing the BLAS in my code to generate incorrect results?

Also, I found out that if I set MKL_NUM_THREADS=1 along with OMP_NUM_THREADS=1 then it all works fine. I have the same code running on Linux, and there it works just fine with more than 1 thread.

Thanks.

-Arthur
0 Kudos
Konstantin_A_Intel
271 Views
Hi Arthur,
I'm glad that the linking problem has been resolved. Ok, let's go further..
Am I right that you have a code calling BLAS that works well on Linux and works well on Window when compiled from the command line? And it fails when compiled from VS, but works well everywhere if you set MKL_NUM_THREADS=1 & OMP_NUM_THREADS=1, doesn't it?
A few questions:
- Is the code on C or Fortran?
- Which compiler do you use in command line and with VS? Intel, MS?
- Please send how did you compile your code via the command line.
- Can you send the code (or a peace of the code which can be used as a reproducer)?
Regards,
Konstantin
0 Kudos
mullervki
Beginner
271 Views
Konstantin,

First of all, thanks for all the help so far. You've been incredible. I truly appreciate it.

Now for your questions:

1) Yes: the code works on Linux, and also works on Windows when compiled from the command line. It also works when MKL_NUM_THREADS = OMP_NUM_THREADS = 1. The fact that it does work with more than one thread if built from the command line is reassuing; it tells me that I don't have data conflict of any kind in my code, and the problem may be elsewhere.

2) The program is in C.

3) I use VS compiler, 2008

4) Unfortunately, I'm unable to send the code - it's many thousands of lines of code with BLAS calls throughout its many parts. But this prompted me to consider the alternative: a stand-alone program that calls the BLAS, that I can build inside and outside VS and that I can share with you. I'll try to find some time to do that.

5) One important point I should share: from the command line I'm building an optimized code. In VS, a debug version. I had trouble locating the Microsoft OpenMP DLL and after a few online searches concluded that one had to "#undef _DEBUG" right before "#include ". I wonder whether this is an issue. If I don't do this, I get the error "VCOMP90.DLL not found" at runtime. If there's a way to get around this problem without the "#undef# hack I described above I'd love to know that. It could be it's what's causing my problems.

6) From the command line, these are the flags I use to compile:

/O2
/w
/W4
/EHsc
/nologo
/c
/openmp
/LD
/EHsc
/D_CONSOLE
/D__LIB__
/DFAR=
/D_WINDOWS
/MD

Maybe one issue is the include files - more specifically, omp.h. I am compiling with /openmp and I'm using VS compiler. However, in order to link with the BLAS I'm linking with libiomp5md.lib. Could this be an issue?

I also did the following: the only BLAS Level 3 function I'm using - and which will use both processors - is DGEMM. I have very few locations in my code where I have this function being called, and the matrices are not that large (at least, not in the small problem I'm running). So I wrote to a file the arguments to A,B,C,M,N,K,alpha,beta,LDA,LDB,LDC, with C being written before and after DGEMM is called.

Then I compared the results for OMP_NUM_THREADS/MKL_NUM_THREADS set to 1 and 2. Up to a point the numbers are identical, but after a few calls, the values before the calculation are all identical, but the resulting value of C is different (!!!) The scalar parameters are as follows:

M= 69, N= 3, K= 12, LDA= 81, LDB= 3, LDC= 69
alpha= -1.00000000000000e+000, beta= 1.00000000000000e+000

A, B, and C are identical before DGEMM is called. But the result C is different in the two cases.

Now what?

Thanks (again and again...)

-Arthur
0 Kudos
Gennady_F_Intel
Moderator
271 Views
Arthur,
- it would be great if will you give us the example which we can check on our side...
- the second - the leading dimentional LDA== 81. Is it correct?
--Gennady
0 Kudos
mullervki
Beginner
271 Views
Gennady,

I believe the data I sent you is correct, including LDA. Do you see a problem with this number?

I'm attaching 2 files. Both contain the same A and B (C is initially zero). One file has the resulting C when running in 1 processor and the other one with 2 processors.

Also, the first 2 arguments to DGEMM are "N" and "T".

-Arthur
0 Kudos
Konstantin_A_Intel
271 Views
Hi Arthur,
It looks like using MS and Intel OpenMP libraries into a single application is the issue.
Does your code need OpenMP, or is it just needed for MKL? If it's needed for MKLonly, I would try to switch-off any openmp flags in MS compiler and just try to link with Intel MKL libraries and withlibiomp5.
And another thing - I would recommend you to use Intel C/C++ compiler if you use OpenMP, MKL etc.
Regards,
Konstantin
0 Kudos
mullervki
Beginner
271 Views
Konstantin,

I tested your hypothesis, and you are right. If I turn off the OpenMP from VS but still use MKL_NUM_THREADS=2, then everything works fine.

I guess this means that I either use the Intel Compiler with OpenMP if I want both OpenMP in my code and the parallel BLAS - which I haven't tested yet; or I have to sacrifice either the use of OpenMP in my code or the parallel BLAS from MKL.

Honestly, neither is a very good solution. Ideally, I should be able to use MKL without having to sacrifice any of the tools I'm currently using. I guess there's no workaround for this, is there?

-Arthur
0 Kudos
Reply