Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Popov__Maxim
Beginner
363 Views

MKL pardiso performance problem if run on heavily used memory heap

MKL 2021.3 (+ tbb 2021.3)
Windows 10, Visual Studio 2017

2 x Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz (20 cores total)
192 Gb RAM


I have relatively complex performance issue(s) with MKL pardiso.
Please find attached visual studio project which reproduces bug(s).
It is pure synthetic example. But we have similar problem (and even more) in our commercial product.

The test runs the same task 8 times. After 4th run we create some "garbage" in memory using mallocs and free (allocating 7.6 Gb and free some of them to create heap fragmentation).
As you can see from the protocols (below) mkl_2018.0 works fine (no issues). But mkl_2021.3 (I also tested 2020.1 with the same result) has couple of problems:
First: Solution time 3 times slower. (comparing mkl_2018.0 and mkl_2021.3)
Second: factorizations are ~10+ times slower after we created "garbage" in memory. Also our commercial code have the same problem with solution time (it slows down ~5 times if run on heavily used heap) but I can't reproduce it in the test.

 

Protocols

mkl 2018.0

*** Symbolic factorization = 0.521841
*** Numerical factorization = 0.003823
*** Solution = 2.82555


*** Symbolic factorization = 0.0160846
*** Numerical factorization = 0.003187
*** Solution = 2.84516


*** Symbolic factorization = 0.0159267
*** Numerical factorization = 0.0032141
*** Solution = 2.86703


*** Symbolic factorization = 0.015718
*** Numerical factorization = 0.0037508
*** Solution = 2.85035


Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.0148403
*** Numerical factorization = 0.002934
*** Solution = 2.81944


*** Symbolic factorization = 0.0145776
*** Numerical factorization = 0.0030027
*** Solution = 2.82286


*** Symbolic factorization = 0.0142837
*** Numerical factorization = 0.0030718
*** Solution = 2.84451


*** Symbolic factorization = 0.0138617
*** Numerical factorization = 0.0030959
*** Solution = 2.82229


mkl 2021.3

*** Symbolic factorization = 0.158939
*** Numerical factorization = 0.0044622
*** Solution = 8.59468


*** Symbolic factorization = 0.0150729
*** Numerical factorization = 0.0027243
*** Solution = 8.78183


*** Symbolic factorization = 0.0148563
*** Numerical factorization = 0.0026545
*** Solution = 8.57554


*** Symbolic factorization = 0.0149359
*** Numerical factorization = 0.0027301
*** Solution = 8.85421


Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.166303
*** Numerical factorization = 0.131799
*** Solution = 8.84035


*** Symbolic factorization = 0.168182
*** Numerical factorization = 0.134787
*** Solution = 8.64635


*** Symbolic factorization = 0.189809
*** Numerical factorization = 0.131606
*** Solution = 8.61737


*** Symbolic factorization = 0.165271
*** Numerical factorization = 0.134852
*** Solution = 8.61592

Labels (2)
0 Kudos
8 Replies
Gennady_F_Intel
Moderator
337 Views

Do you see this regression with the OpenMP runtime version of MKL Pardiso as well?

Popov__Maxim
Beginner
333 Views

I haven't checked OpenMP runtime version of MKL (and really don't know how to do that).

We don't use OpenMP in our product anymore, so we are not interested in OpenMP version of MKL

Gennady_F_Intel
Moderator
312 Views

The reported behavior has not been reproduced on Linux OS ( RH7) with AVX2 and AVX-512 code paths.

Here are the logs I see with MKL versions 2018.1 and 2021.3 correspondingly. I only added the call of mkl_get_version() function to report mkl's version info:

 

MKL v.2018.0.1 

Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors

 

*** Symbolic factorization = 0.0398407

*** Numerical factorization = 0.104073

*** Solution = 3.41783

 

*** Symbolic factorization = 0.019124

*** Numerical factorization = 0.00386271

*** Solution = 3.38974

 

*** Symbolic factorization = 0.0185753

*** Numerical factorization = 0.0031121

*** Solution = 3.37459

 

*** Symbolic factorization = 0.0183605

*** Numerical factorization = 0.0051151

*** Solution = 3.39523

 

Making 7.6 Gb garbage in memory

 

*** Symbolic factorization = 0.0195702

*** Numerical factorization = 0.00301139

*** Solution = 3.44974

 

*** Symbolic factorization = 0.018768

*** Numerical factorization = 0.00286272

*** Solution = 3.44151

 

*** Symbolic factorization = 0.0177862

*** Numerical factorization = 0.00294998

*** Solution = 3.44731

 

*** Symbolic factorization = 0.0173954

*** Numerical factorization = 0.00292216

*** Solution = 3.4427

 

/****************************************************/

MKL v.2021.0.3 

Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors

 

*** Symbolic factorization = 0.0315676

*** Numerical factorization = 0.0498327

*** Solution = 3.36807

 

*** Symbolic factorization = 0.0182586

*** Numerical factorization = 0.00408213

*** Solution = 3.37433

 

*** Symbolic factorization = 0.0171186

*** Numerical factorization = 0.00311218

*** Solution = 3.40193

 

*** Symbolic factorization = 0.0181005

*** Numerical factorization = 0.00289025

*** Solution = 3.38738

 

Making 7.6 Gb garbage in memory

 

*** Symbolic factorization = 0.0198418

*** Numerical factorization = 0.00326361

*** Solution = 3.44586

 

*** Symbolic factorization = 0.0174867

*** Numerical factorization = 0.00287047

*** Solution = 3.30897

 

*** Symbolic factorization = 0.016585

*** Numerical factorization = 0.00287961

*** Solution = 3.39236

 

*** Symbolic factorization = 0.0178846

*** Numerical factorization = 0.00268261

*** Solution = 3.43325

 

The AVX-512 results are very similar.

 

 

Popov__Maxim
Beginner
301 Views

Thank you for checking it on Linux!

 

Most probably it's Windows (or even Windows 10) specific problem.

It looks like slowdown in memory allocation in Windows, if allocate relatively large blocks.

MKL has it's own memory pool (according to documentation), but it didn't help in this case. I guess that pardiso is not using MKL's memory pool for all allocations which leads to slowdown on Windows.

Kirill_V_Intel
Employee
277 Views

Hi Maxim!

As a quick check while we are trying to reproduce the issue: can you try to set he environment variable MKL_DISABLE_FAST_MM=1 prior to calling the test and see if the behavior changes?

Thanks,
Kirill

Popov__Maxim
Beginner
262 Views

Hi Kirill!

 

Yes, MKL_DISABLE_FAST_MM=1 significantly degrades performance:

 

*** Symbolic factorization = 0.122878
*** Numerical factorization = 0.411002
*** Solution = 8.86472


*** Symbolic factorization = 0.0150951
*** Numerical factorization = 0.403717
*** Solution = 8.87216


*** Symbolic factorization = 0.0146924
*** Numerical factorization = 0.3977
*** Solution = 8.87921


*** Symbolic factorization = 0.0148401
*** Numerical factorization = 0.397932
*** Solution = 8.83029


Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.190399
*** Numerical factorization = 0.479833
*** Solution = 8.83713


*** Symbolic factorization = 0.169229
*** Numerical factorization = 0.5047
*** Solution = 8.83924


*** Symbolic factorization = 0.176403
*** Numerical factorization = 0.482541
*** Solution = 8.83819


*** Symbolic factorization = 0.179341
*** Numerical factorization = 0.532943
*** Solution = 8.86166

 

Regards,

Maxim

Kirill_V_Intel
Employee
252 Views

Thanks for the experiment!

If we saw stable (but ofc slower) times before/after garbage allocation with disabled fast memory manager, it would be a great hint for us. Alas, as I see, after making garbage allocations the times go up as well so we can't be sure that fast memory manager affects the original issue.

Thanks for trying.

Best,
Kirill

Popov__Maxim
Beginner
102 Views

Hello!

 

Are there any updates on the issue?

In the meantime we redefined MKL's pointers i_malloc, i_calloc, i_realloc and i_free with our own memory pool allocation functions. After that the problems seems to disappear. But we consider it as a temporary solution.

 

Regards,

Maksim

Reply