- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I have the exact same code running on Linux64 and Win64. Everything works well in Linux64. But in Win64, even though I set OMP_NUM_THREADS and MKL_NUM_THREADS to 2, Pardiso reports

< Parallel Direct Factorization with #processors: >

**1**

And this happens both with in-core as well as out-of-core. I'm using version 10.3, build 20110314.

Do I need to do anything else other than set the above 2 environment variables?

Thanks.

Link Copied

28 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

--Gennady

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The matrix has about 200,000 equations with about 8 million nonzeros. It's a symmetric indefinite matrix. Right before the first call to Pardiso I print

OMP_NUM_THREADS= 2

MKL_NUM_THREADS= 2

Here's all the printout:

**=== PARDISO is running in In-Core mode, because iparam(60)=0 ===**

================ PARDISO: solving a symmetric indef. system ================

The local (internal) PARDISO version is : 103000115

1-based array indexing is turned ON

PARDISO double precision computation is turned ON

METIS algorithm at reorder step is turned ON

Single-level factorization algorithm is turned ON

Scaling is turned ON

Summary PARDISO: ( reorder to reorder )

================

Times:

======

Time spent in calculations of symmetric matrix portrait(fulladj): 0.233544 s

Time spent in reordering of the initial matrix(reorder) : 3.419899 s

Time spent in symbolic factorization(symbfct) : 0.900987 s

Time spent in allocation of internal data structures(malloc) : 0.139583 s

Time spent in additional calculations : 1.692345 s

Total time spent : 6.386359 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 1

< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b>

#equations: 219057

#non-zeros in A: 7798701

non-zeros in A (): 0.016252

#right-hand sides: 0

< Factors L and U >

#columns for each panel: 128

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 29451

size of largest supernode: 3438

number of nonzeros in L 77824417

number of nonzeros in U 1

number of nonzeros in L+U 77824418

================ PARDISO: solving a symmetric indef. system ================

The local (internal) PARDISO version is : 103000115

1-based array indexing is turned ON

PARDISO double precision computation is turned ON

METIS algorithm at reorder step is turned ON

Single-level factorization algorithm is turned ON

Scaling is turned ON

Summary PARDISO: ( reorder to reorder )

================

Times:

======

Time spent in calculations of symmetric matrix portrait(fulladj): 0.233544 s

Time spent in reordering of the initial matrix(reorder) : 3.419899 s

Time spent in symbolic factorization(symbfct) : 0.900987 s

Time spent in allocation of internal data structures(malloc) : 0.139583 s

Time spent in additional calculations : 1.692345 s

Total time spent : 6.386359 s

Statistics:

===========

< Parallel Direct Factorization with #processors: > 1

< Hybrid Solver PARDISO with CGS/CG Iteration >

< Linear system Ax = b>

#equations: 219057

#non-zeros in A: 7798701

non-zeros in A (): 0.016252

#right-hand sides: 0

< Factors L and U >

#columns for each panel: 128

#independent subgraphs: 0

< Preprocessing with state of the art partitioning metis>

#supernodes: 29451

size of largest supernode: 3438

number of nonzeros in L 77824417

number of nonzeros in U 1

number of nonzeros in L+U 77824418

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I added the call

mkl_set_num_threads(2);

right before calling Pardiso the first time. Still, I get #processors: 1. This happens in both Win32 and Win64.

Any thoughts? Could I have inadvertendly set some parameter incorrectly that could be triggering this behavior?

-Arthur

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Does you win32/win64 system has 2 or more physical cores? MKL checks this and sets number of threads to 1 if the system has only 1 physical core.

Regards,

Konstantin

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The machine has 2 physical processors. When running Pardiso the Windows task manager displays usage at around 50%.

Is there a way to turn on some internal debugging so we can get more information on this as MKL is running?

-Arthur

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the reply.

I'll be very honest: your answer blew me away. I'm new to OMP, so I had never heard of either OMP_DYNAMIC or MKL_DYNAMIC before.

To the best of my knowledge, my machine has 2 processors, and I assume each has a single core. Each processor is a Intel Xeon, and Dell describes them as "C8508 Processor, 80546K, 3.0G, 2M, XNI 800, N0", where C8508 is the Dell part number (probably not too useful for you).

Given that, I tried mkl_set_dynamic(0) and mkl_set_dynamic(1). It made no difference. In both cases, during the matrix factorization, I only see one processor at work (task manager showing 50% utilization).

1) Should I see any difference between the two mkl_set_dynamic calls?

2) Is the fact that I see only 50% utilization of the CPU with the task manager a true indication that only one CPU is being used? I always believed this is the case, but maybe I don't have all the facts.

3) Is there a way to *force* MKL to use 2 processors, even if it believes it's better off with only 1? All I want to see is that everything is being done correctly. Once I know that's the case then I'll let MKL make its own smarter decisions.

Thanks again.

-Arthur

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

OpenMP dynamic is a different facility from MKL dynamic.

I think I've confused you about MKL_DYNAMIC. See this earlier post specifically about how to get MKL to use all the HyperThreads by setting MKL_DYNAMIC=FALSE and specifying MKL_NUM_THREADS.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Here's the information:

Number of processors = 2

Multi-core capable = NO

Hyperthreading capable = YES

Hyperthreading is OFF (it's the factory default and has never been changed)

Given the above information, if I believe what the task manager is showing me is the 2 CPUs on the machine.

So now, the question: what do I need to do - if at all possible - to get Pardiso to use the 2 processors in parallel?

Thanks.

-Arthur

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Regarding the information you provided it seems your computer is not multi-core. So, MKL strategy is to use only 1 thread for achieving optimal performance.

If you still want to use 2 threads, please call following functions prior calling MKL function (but you will not get performance improvement most likely):

mkl_set_num_threads( 2 );

mkl_set_dynamic( false );

or set env. variables:

set MKL_NUM_THREADS=2

set MKL_DYNAMIC=false

Regards,

Konstantin

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Does this mean that MKL will not parallelize accross multiple processors? If I had a 4-processor, each single core, I wouldn't be able to benefit from the parallelization in MKL?

-Arthur

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

MKL is able to run in parallel across multiple processors. MKL sets a number of threads equal to a number of physical cores available totally in your system. In your case, the number of physical cores is equal to 1 on Windows:

Number of processors = 2 // Number of logical processors is 2

Multi-core capable = NO // No multi-core, it means 1 physical cores

Hyperthreading capable = YES // Hyperthreading, it means 2 logical processors per 1 physical core

Multi-core capable = NO // No multi-core, it means 1 physical cores

Hyperthreading capable = YES // Hyperthreading, it means 2 logical processors per 1 physical core

To make sure, you can run 'systeminfo' command under 'cmd' and report the info about the processor. I just tried to run PARDISO on my dual-core laptop (physically dual-core) and it reported 2 threads.

As far as I'm concerned about you example (a 4-processor, each single core) - did you mean 4-socket system? I'm sure that MKL will use 4 threads in this case, as 4 cores will be available.

Best regards,

Konstantin

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Here's the relevant result from systeminfo:

System Manufacturer: Dell Inc.

System Model: Precision WorkStation 670

System Type: x64-based PC

Processor(s): 2 Processor(s) Installed.

[01]: EM64T Family 15 Model 4 Stepping 3 GenuineIntel ~2993 Mhz

[02]: EM64T Family 15 Model 4 Stepping 3 GenuineIntel ~2993 Mhz

I guess I'm still confused. Is there any combination of parameters/environment variables that I can set on this machine that will show #processors = 2? Or is this misleading and it is really using both processors?

-Arthur

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Let's try another effort to obtain precise info about your system: can you install free CPU-Z tool available here?

On CPU tab it reports (in the very bottom) number of Cores and Threads for each processor. So, if it will be different it means that hyperthreading is ON on your system.

Regards,

Konstantin

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Here's the information from CPU-Z:

Processor #1:

Core Speed: 2793.1Mz

Multiplier x14.0

Bus Speed 199.5MHz

Rated FSB 798.1 MHz

L1 Data 16 KBytes 8-way

Trace 12 Kuops 8-way

Level 2 2048 KBytes 8-way

Cores 1

Threads 1

The data for Procesor #2 is identical.

-Arthur

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Could you please run the following program on your windows machine (I compiled it under MS VS 2008):

#include "stdafx.h"

#include "mkl.h"

int _tmain(int argc, _TCHAR* argv[])

{

printf("\nthreads = %d\n", mkl_get_max_threads());

mkl_set_num_threads(1);

printf("\nthreads = %d\n", mkl_get_max_threads());

mkl_set_num_threads(2);

printf("\nthreads = %d\n", mkl_get_max_threads());

mkl_set_num_threads(4);

printf("\nthreads = %d\n", mkl_get_max_threads());

mkl_set_dynamic(false);

printf("\nthreads = %d\n", mkl_get_max_threads());

return 0;

}

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Here's the output of your program. I also echoed the important environment variables before running the program.

C:\>set OMP_NUM_THREADS

OMP_NUM_THREADS=2

C:\>set MKL_NUM_THREADS

MKL_NUM_THREADS=2

C:\>exam1.exe

threads = 1

threads = 1

threads = 1

threads = 1

threads = 1

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

It looks either like a bug or like you've linked MKL with sequential layer (mkl_sequential.lib instead of mkl_intel_thread.lib).

Could you please report your linking line? If you use Visual Studio, it would be great if you send a content of "Project->"project Properties->Linker->Command line" item of the main menu.

Regards,

Konstantin

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

BINGO!!! I think you got to the bottom of the problem!!! Here's what I'm linking with:

mkl_solver_lp64_sequential.lib mkl_intel_lp64.lib mkl_sequential.lib mkl_core.lib

I'll look at the documentation to check the list of libraries I need to include.

Thanks!

-Arthur

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I'm clearly still doing something wrong. I'm now getting the following error:

MKL FATAL ERROR on load the function mkl_blas_xdswap

I guess I need some guidance on which libraries EXACTLY to use if I'm compiling with VS 2008, with both OpenMP and with Windows threads, on both 32 and 64 bit platforms.

In fact, if you could tell me all environment variables I have to set for my command prompt mode that would also help.

Thanks.

-Arthur

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page