Community
cancel
Showing results for 
Search instead for 
Did you mean: 
mullervki
Beginner
194 Views

Pardiso on WIN64 using only one thread

Hello,

I have the exact same code running on Linux64 and Win64. Everything works well in Linux64. But in Win64, even though I set OMP_NUM_THREADS and MKL_NUM_THREADS to 2, Pardiso reports

< Parallel Direct Factorization with #processors: > 1

And this happens both with in-core as well as out-of-core. I'm using version 10.3, build 20110314.

Do I need to do anything else other than set the above 2 environment variables?

Thanks.
0 Kudos
28 Replies
Gennady_F_Intel
Moderator
171 Views

That's strange. You don't need to do anything else. What is task size and type of matrix you solve. We need to check it.
--Gennady
mullervki
Beginner
171 Views

Gennady,

The matrix has about 200,000 equations with about 8 million nonzeros. It's a symmetric indefinite matrix. Right before the first call to Pardiso I print

OMP_NUM_THREADS= 2
MKL_NUM_THREADS= 2

Here's all the printout:

=== PARDISO is running in In-Core mode, because iparam(60)=0 ===


================ PARDISO: solving a symmetric indef. system ================
The local (internal) PARDISO version is : 103000115
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Single-level factorization algorithm is turned ON
Scaling is turned ON


Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time spent in calculations of symmetric matrix portrait(fulladj): 0.233544 s
Time spent in reordering of the initial matrix(reorder) : 3.419899 s
Time spent in symbolic factorization(symbfct) : 0.900987 s
Time spent in allocation of internal data structures(malloc) : 0.139583 s
Time spent in additional calculations : 1.692345 s
Total time spent : 6.386359 s

Statistics:
===========
< Parallel Direct Factorization with #processors: > 1
< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b>
#equations: 219057
#non-zeros in A: 7798701
non-zeros in A (): 0.016252

#right-hand sides: 0

< Factors L and U >
#columns for each panel: 128
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 29451
size of largest supernode: 3438
number of nonzeros in L 77824417
number of nonzeros in U 1
number of nonzeros in L+U 77824418
mullervki
Beginner
171 Views

Gennady,

I added the call

mkl_set_num_threads(2);

right before calling Pardiso the first time. Still, I get #processors: 1. This happens in both Win32 and Win64.

Any thoughts? Could I have inadvertendly set some parameter incorrectly that could be triggering this behavior?

-Arthur
171 Views

Hi Arthur,
Does you win32/win64 system has 2 or more physical cores? MKL checks this and sets number of threads to 1 if the system has only 1 physical core.
Regards,
Konstantin
mullervki
Beginner
171 Views

Konstantin,

The machine has 2 physical processors. When running Pardiso the Windows task manager displays usage at around 50%.

Is there a way to turn on some internal debugging so we can get more information on this as MKL is running?

-Arthur
TimP
Black Belt
171 Views

Just in case, if you mean you have hyperthreading enabled, remember that MKL tries to maximize performance by using just 1 thread per pair of hyperthread logical processors, unless you over-ride by setting MKL_DYNAMIC. The term physical processor is more likely to refer to a complete core, which would support a pair of logical processors when hyperthreading is enabled.
mullervki
Beginner
171 Views

Tim,

Thanks for the reply.

I'll be very honest: your answer blew me away. I'm new to OMP, so I had never heard of either OMP_DYNAMIC or MKL_DYNAMIC before.

To the best of my knowledge, my machine has 2 processors, and I assume each has a single core. Each processor is a Intel Xeon, and Dell describes them as "C8508 Processor, 80546K, 3.0G, 2M, XNI 800, N0", where C8508 is the Dell part number (probably not too useful for you).

Given that, I tried mkl_set_dynamic(0) and mkl_set_dynamic(1). It made no difference. In both cases, during the matrix factorization, I only see one processor at work (task manager showing 50% utilization).

1) Should I see any difference between the two mkl_set_dynamic calls?

2) Is the fact that I see only 50% utilization of the CPU with the task manager a true indication that only one CPU is being used? I always believed this is the case, but maybe I don't have all the facts.

3) Is there a way to *force* MKL to use 2 processors, even if it believes it's better off with only 1? All I want to see is that everything is being done correctly. Once I know that's the case then I'll let MKL make its own smarter decisions.

Thanks again.

-Arthur

TimP
Black Belt
171 Views

Apparently, it's an "Irwindale" single core HyperThread CPU. These were probably available in both dual and single CPU platforms. Typically, floating point performance of the dual CPU platform was reduced by 15% when HyperThread was left enabled, even on linux (worse on Windows, not so bad on single CPU). You can check your BIOS setup screen to see whether HyperThreading is enabled. If enabled, and you see just 2 processors in task manager, there's only 1 CPU, and running 1 thread would show 50% on task manager, even though you get more performance than you would with 2 threads.

OpenMP dynamic is a different facility from MKL dynamic.
I think I've confused you about MKL_DYNAMIC. See this earlier post specifically about how to get MKL to use all the HyperThreads by setting MKL_DYNAMIC=FALSE and specifying MKL_NUM_THREADS.
mullervki
Beginner
171 Views

Tim,

Here's the information:

Number of processors = 2
Multi-core capable = NO
Hyperthreading capable = YES

Hyperthreading is OFF (it's the factory default and has never been changed)

Given the above information, if I believe what the task manager is showing me is the 2 CPUs on the machine.

So now, the question: what do I need to do - if at all possible - to get Pardiso to use the 2 processors in parallel?

Thanks.

-Arthur
171 Views

Hi Arthur,
Regarding the information you provided it seems your computer is not multi-core. So, MKL strategy is to use only 1 thread for achieving optimal performance.
If you still want to use 2 threads, please call following functions prior calling MKL function (but you will not get performance improvement most likely):
mkl_set_num_threads( 2 );
mkl_set_dynamic( false );
or set env. variables:
set MKL_NUM_THREADS=2
set MKL_DYNAMIC=false
Regards,
Konstantin
mullervki
Beginner
171 Views

Konstantin,

Does this mean that MKL will not parallelize accross multiple processors? If I had a 4-processor, each single core, I wouldn't be able to benefit from the parallelization in MKL?

-Arthur
171 Views

Hi Arthur,
MKL is able to run in parallel across multiple processors. MKL sets a number of threads equal to a number of physical cores available totally in your system. In your case, the number of physical cores is equal to 1 on Windows:
Number of processors = 2 // Number of logical processors is 2
Multi-core capable = NO // No multi-core, it means 1 physical cores
Hyperthreading capable = YES // Hyperthreading, it means 2 logical processors per 1 physical core
To make sure, you can run 'systeminfo' command under 'cmd' and report the info about the processor. I just tried to run PARDISO on my dual-core laptop (physically dual-core) and it reported 2 threads.
As far as I'm concerned about you example (a 4-processor, each single core) - did you mean 4-socket system? I'm sure that MKL will use 4 threads in this case, as 4 cores will be available.
Best regards,
Konstantin
mullervki
Beginner
171 Views

Konstantin,

Here's the relevant result from systeminfo:

System Manufacturer: Dell Inc.
System Model: Precision WorkStation 670
System Type: x64-based PC
Processor(s): 2 Processor(s) Installed.
[01]: EM64T Family 15 Model 4 Stepping 3 GenuineIntel ~2993 Mhz
[02]: EM64T Family 15 Model 4 Stepping 3 GenuineIntel ~2993 Mhz


I guess I'm still confused. Is there any combination of parameters/environment variables that I can set on this machine that will show #processors = 2? Or is this misleading and it is really using both processors?

-Arthur
171 Views

Ok, it seems this way to determine precise information about the system is not the best. In fact, I checked that systeminfo reports logical processors (so, It will give the same information for 2 single-core processors, 1 dual-core and, for instance, a single-core with hyperthreading ON). However, I checked MKL on the system of 2 single-core processors (with rather old Nocona processor) and MKL PARDISO reported 2 threads.
Let's try another effort to obtain precise info about your system: can you install free CPU-Z tool available here?
On CPU tab it reports (in the very bottom) number of Cores and Threads for each processor. So, if it will be different it means that hyperthreading is ON on your system.
Regards,
Konstantin
mullervki
Beginner
171 Views

Konstantin,

Here's the information from CPU-Z:

Processor #1:
Core Speed: 2793.1Mz
Multiplier x14.0
Bus Speed 199.5MHz
Rated FSB 798.1 MHz
L1 Data 16 KBytes 8-way
Trace 12 Kuops 8-way
Level 2 2048 KBytes 8-way
Cores 1
Threads 1

The data for Procesor #2 is identical.

-Arthur

171 Views

Ok, now it seems really strange..
Could you please run the following program on your windows machine (I compiled it under MS VS 2008):
#include "stdafx.h"
#include "mkl.h"
int _tmain(int argc, _TCHAR* argv[])
{
printf("\nthreads = %d\n", mkl_get_max_threads());
mkl_set_num_threads(1);
printf("\nthreads = %d\n", mkl_get_max_threads());
mkl_set_num_threads(2);
printf("\nthreads = %d\n", mkl_get_max_threads());
mkl_set_num_threads(4);
printf("\nthreads = %d\n", mkl_get_max_threads());
mkl_set_dynamic(false);
printf("\nthreads = %d\n", mkl_get_max_threads());
return 0;
}
mullervki
Beginner
171 Views

Konstantin,

Here's the output of your program. I also echoed the important environment variables before running the program.

C:\>set OMP_NUM_THREADS
OMP_NUM_THREADS=2

C:\>set MKL_NUM_THREADS
MKL_NUM_THREADS=2

C:\>exam1.exe

threads = 1

threads = 1

threads = 1

threads = 1

threads = 1
171 Views

Arthur, thank you for the information!
It looks either like a bug or like you've linked MKL with sequential layer (mkl_sequential.lib instead of mkl_intel_thread.lib).
Could you please report your linking line? If you use Visual Studio, it would be great if you send a content of "Project->"project Properties->Linker->Command line" item of the main menu.
Regards,
Konstantin
mullervki
Beginner
171 Views

Konstantin,

BINGO!!! I think you got to the bottom of the problem!!! Here's what I'm linking with:

mkl_solver_lp64_sequential.lib mkl_intel_lp64.lib mkl_sequential.lib mkl_core.lib

I'll look at the documentation to check the list of libraries I need to include.

Thanks!

-Arthur
mullervki
Beginner
63 Views

Well,

I'm clearly still doing something wrong. I'm now getting the following error:

MKL FATAL ERROR on load the function mkl_blas_xdswap

I guess I need some guidance on which libraries EXACTLY to use if I'm compiling with VS 2008, with both OpenMP and with Windows threads, on both 32 and 64 bit platforms.

In fact, if you could tell me all environment variables I have to set for my command prompt mode that would also help.

Thanks.

-Arthur
Reply