- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm tryng to use the Pardiso solver with the TBB treading layer.
It seems that Pardiso got alot of idle time with OMP in my kind of problems
this page say that Pardiso supports TBB
https://software.intel.com/en-us/articles/using-intel-mkl-and-intel-tbb-in-the-same-application
so I gave it a try
I'm linking with
mkl_intel_lp64_dll.lib mkl_core_dll.lib mkl_tbb_thread_dll.lib tbb.lib
and get single threaded execution. (same result with the static libs)
I'm using MVSV 2015.
what am I missing ?
tnx
D
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> It seems that Pardiso got alot of idle time with OMP in my kind of problems.
<< what is the problem size? and could you try to take the openmp threaded version and compare the perf results?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The reference about using MKL with TBB appears to say that certain MKL functions are available in a TBB version, and gives a specific link command for that purpose,different from what you show here. If you use both OpenMP and TBB threading, you will expect that idle OpenMP threads persist for KMP_BLOCKTIME before a TBB thread can run on the same hardware thread.
If you are following the suggestion about tbb:affinity_partitioner and still using OpenMP as well, you might try some scheme such as limiting TBB threads to 1 per core (if you have enabled HyperThreading), taking advantage of the Intel OpenMP default limit of 1 per core, or specifically pinning OpenMP and TBB threads to different cores.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think I have a direction.
for some reason mkl_sequential is loaded even when mkl_tbb_thread_dll.lib is linked.
so this is the reason for the single threaded times.
any idea?
as for the MOP performance, here some info:
threads time
1 1000
2 630
4 400
problem information:
0-based array is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.022640 s
Time spent in reordering of the initial matrix (reorder) : 0.398392 s
Time spent in symbolic factorization (symbfct) : 0.244236 s
Time spent in data preparations for factorization (parlist) : 0.005588 s
Time spent in allocation of internal data structures (malloc) : 0.014131 s
Time spent in additional calculations : 0.170513 s
Total time spent : 0.855501 s
Statistics:
===========
Parallel Direct Factorization is running on 1 OpenMP
< Linear system Ax = b >
number of equations: 258687
number of non-zeros in A: 2821302
number of non-zeros in A (%): 0.004216
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
number of supernodes: 50167
size of largest supernode: 1041
number of non-zeros in L: 32337906
number of non-zeros in U: 1
number of non-zeros in L+U: 32337907
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
UPDATE:
after stripping the project and converting to Intel compiler, mkl_tbb_thread_dll is loaded but crashes :(
here is the call stack:
> mkl_tbb_thread.dll!00007ffa1095a067() Unknown
tbb.dll!tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task & parent, tbb::task * child) Line 467 C++
tbb.dll!tbb::internal::arena::process(tbb::internal::generic_scheduler & s) Line 147 C++
tbb.dll!tbb::internal::market::process(rml::job & j) Line 677 C++
tbb.dll!tbb::internal::rml::private_worker::run() Line 276 C++
tbb.dll!tbb::internal::rml::private_worker::thread_routine(void * arg) Line 229 C++
ucrtbase.dll!00007ffa3dc982dd() Unknown
probably some runtime version incompatibility.
the tbb runtime is
compilers_and_libraries_2016.2.180\windows\redist\intel64_win\tbb\vc14\tbb.dll
tested with vc_mt\tbb.dll
a simple tbb for loop works fine in the same project.
it seems that mkl_tbb_thread_dll gor for ABI compatibility issues with the tbb runtime
any idea?
tnx
D
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
D.S! How could we reproduce the problem? I checked with some of Pardiso's example and linked with vc14 tbb's dll. no issues were detected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Gennady,
Just got to test it again. the crash is data dependent. tbb work fine with a few test matrices I tried , but crash in phase 11 with some of my data sets.
for the a diagonal marix with 100000, and a few fandom OD elements tbb was actually a little slower. and phase 11 seems not threaded at all.
I can send you the data with a simple code that loads it if you need it.
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Daniel, pls try to set iparm[1]=0 instead of iparm[1]=2 (which is default) and check how it will work on your side.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
still the same crash... with iparm[1] = 0,2,3
i'm using 3, that make phase 11 about x2 faster with openmp.
it seems that the data that don't crash is not using tbb threads at all
here is that crash:
Exception thrown at 0x00007FFBF459A067 (mkl_tbb_thread.dll) in MklTester.exe: 0xC0000005: Access violation writing location 0x00000025485EC000.
some time at the main thread some time on a worker thread,
stack:
> mkl_tbb_thread.dll!00007ffbf459a067() Unknown
tbb.dll!tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all(tbb::task & parent, tbb::task * child) Line 467 C++
tbb.dll!tbb::internal::arena::process(tbb::internal::generic_scheduler & s) Line 147 C++
tbb.dll!tbb::internal::market::process(rml::job & j) Line 677 C++
tbb.dll!tbb::internal::rml::private_worker::run() Line 276 C++
tbb.dll!tbb::internal::rml::private_worker::thread_routine(void * arg) Line 229 C++
ucrtbase.dll!00007ffc1e9482dd() Unknown
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> I can send you the data with a simple code that loads it if you need it.
Daniel, we still don't see the problem on our side with the latest version. Could you please send us these data and the code for reproducing the problem on our side.
Thanks, Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Daniel, with regard to exception with TBB threading. I checked your example with mkl 11.3 u2, and linked with TBB universal vc_mt.dll and with vc12 and vc14.
I have used the test you provided ( slightly modified by added the mkl_get_version(&Version); function ) and compiling launching from command line because MVSC 2015 is not available on my system.
all cases work fine. Below the output when vc14\tbb.dll is used:
..\_Forums\u611238_pardiso_tbb>_5tbb.exe
file mkl-860663123-00z.bin
matrix dim 258687
matrix nnz/2 2821302
64 bits
Major version: 11
Minor version: 3
Update version: 2
Product status: Product
Build: 20160120
Platform: Intel(R) 64 architecture
Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================
num threads 2
=== PARDISO: solving a symmetric positive definite system ===
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
Parallel METIS algorithm at reorder step is turned ON
Scaling is turned ON
Summary: ( reordering phase )
================
Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 0.026380 s
Time spent in reordering of the initial matrix (reorder) : 0.522636 s
Time spent in symbolic factorization (symbfct) : 0.264818 s
Time spent in data preparations for factorization (parlist) : 0.006496 s
Time spent in allocation of internal data structures (malloc) : 0.031902 s
Time spent in additional calculations : 0.198542 s
Total time spent : 1.050775 s
Statistics:
===========
Parallel Direct Factorization is running on 1 OpenMP
< Linear system Ax = b >
number of equations: 258687
number of non-zeros in A: 2821302
number of non-zeros in A (%): 0.004216
number of right-hand sides: 1
< Factors L and U >
number of columns for each panel: 128
number of independent subgraphs: 0
number of supernodes: 50167
size of largest supernode: 1041
number of non-zeros in L: 32337906
number of non-zeros in U: 1
number of non-zeros in L+U: 32337907
symolic factorization time is 1379 ms
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gennedy,
I tested it on another computer, crash every time when using tbb.
windows 10, visual studio 2012/2015 update 1, i7 2600 and i7 4770. mkl 11.3.2.1
Ill stay with OMP for now...
tnx
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page