- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

MKL 2021.3 (+ tbb 2021.3)

Windows 10, Visual Studio 2017

2 x Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz (20 cores total)

192 Gb RAM

I have relatively complex performance issue(s) with MKL pardiso.

Please find attached visual studio project which reproduces bug(s).

It is pure synthetic example. But we have similar problem (and even more) in our commercial product.

The test runs the same task 8 times. After 4th run we create some "garbage" in memory using mallocs and free (allocating 7.6 Gb and free some of them to create heap fragmentation).

As you can see from the protocols (below) mkl_2018.0 works fine (no issues). But mkl_2021.3 (I also tested 2020.1 with the same result) has couple of problems:**First**: Solution time 3 times slower. (comparing mkl_2018.0 and mkl_2021.3)**Second**: factorizations are ~10+ times slower after we created "garbage" in memory. Also our commercial code have the same problem with solution time (it slows down ~5 times if run on heavily used heap) but I can't reproduce it in the test.

**Protocols**

**mkl 2018.0**

*** Symbolic factorization = 0.521841

*** Numerical factorization = 0.003823

*** Solution = 2.82555

*** Symbolic factorization = 0.0160846

*** Numerical factorization = 0.003187

*** Solution = 2.84516

*** Symbolic factorization = 0.0159267

*** Numerical factorization = 0.0032141

*** Solution = 2.86703

*** Symbolic factorization = 0.015718

*** Numerical factorization = 0.0037508

*** Solution = 2.85035

Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.0148403

*** Numerical factorization = 0.002934

*** Solution = 2.81944

*** Symbolic factorization = 0.0145776

*** Numerical factorization = 0.0030027

*** Solution = 2.82286

*** Symbolic factorization = 0.0142837

*** Numerical factorization = 0.0030718

*** Solution = 2.84451

*** Symbolic factorization = 0.0138617

*** Numerical factorization = 0.0030959

*** Solution = 2.82229

**mkl 2021.3**

*** Symbolic factorization = 0.158939

*** Numerical factorization = 0.0044622

*** Solution = 8.59468

*** Symbolic factorization = 0.0150729

*** Numerical factorization = 0.0027243

*** Solution = 8.78183

*** Symbolic factorization = 0.0148563

*** Numerical factorization = 0.0026545

*** Solution = 8.57554

*** Symbolic factorization = 0.0149359

*** Numerical factorization = 0.0027301

*** Solution = 8.85421

Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.166303

*** Numerical factorization = 0.131799

*** Solution = 8.84035

*** Symbolic factorization = 0.168182

*** Numerical factorization = 0.134787

*** Solution = 8.64635

*** Symbolic factorization = 0.189809

*** Numerical factorization = 0.131606

*** Solution = 8.61737

*** Symbolic factorization = 0.165271

*** Numerical factorization = 0.134852

*** Solution = 8.61592

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Do you see this regression with the OpenMP runtime version of MKL Pardiso as well?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I haven't checked OpenMP runtime version of MKL (and really don't know how to do that).

We don't use OpenMP in our product anymore, so we are not interested in OpenMP version of MKL

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The reported behavior has not been reproduced on Linux OS ( RH7) with AVX2 and AVX-512 code paths.

Here are the logs I see with MKL versions 2018.1 and 2021.3 correspondingly. I only added the call of mkl_get_version() function to report mkl's version info:

**MKL v.2018.0.1 **

Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors

*** Symbolic factorization = 0.0398407

*** Numerical factorization = 0.104073

*** Solution = 3.41783

*** Symbolic factorization = 0.019124

*** Numerical factorization = 0.00386271

*** Solution = 3.38974

*** Symbolic factorization = 0.0185753

*** Numerical factorization = 0.0031121

*** Solution = 3.37459

*** Symbolic factorization = 0.0183605

*** Numerical factorization = 0.0051151

*** Solution = 3.39523

Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.0195702

*** Numerical factorization = 0.00301139

*** Solution = 3.44974

*** Symbolic factorization = 0.018768

*** Numerical factorization = 0.00286272

*** Solution = 3.44151

*** Symbolic factorization = 0.0177862

*** Numerical factorization = 0.00294998

*** Solution = 3.44731

*** Symbolic factorization = 0.0173954

*** Numerical factorization = 0.00292216

*** Solution = 3.4427

/****************************************************/

**MKL v.2021.0.3 **

Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors

*** Symbolic factorization = 0.0315676

*** Numerical factorization = 0.0498327

*** Solution = 3.36807

*** Symbolic factorization = 0.0182586

*** Numerical factorization = 0.00408213

*** Solution = 3.37433

*** Symbolic factorization = 0.0171186

*** Numerical factorization = 0.00311218

*** Solution = 3.40193

*** Symbolic factorization = 0.0181005

*** Numerical factorization = 0.00289025

*** Solution = 3.38738

Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.0198418

*** Numerical factorization = 0.00326361

*** Solution = 3.44586

*** Symbolic factorization = 0.0174867

*** Numerical factorization = 0.00287047

*** Solution = 3.30897

*** Symbolic factorization = 0.016585

*** Numerical factorization = 0.00287961

*** Solution = 3.39236

*** Symbolic factorization = 0.0178846

*** Numerical factorization = 0.00268261

*** Solution = 3.43325

The AVX-512 results are very similar.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for checking it on Linux!

Most probably it's Windows (or even Windows 10) specific problem.

It looks like slowdown in memory allocation in Windows, if allocate relatively large blocks.

MKL has it's own memory pool (according to documentation), but it didn't help in this case. I guess that pardiso is not using MKL's memory pool for all allocations which leads to slowdown on Windows.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Maxim!

As a quick check while we are trying to reproduce the issue: can you try to set he environment variable MKL_DISABLE_FAST_MM=1 prior to calling the test and see if the behavior changes?

Thanks,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Kirill!

Yes, MKL_DISABLE_FAST_MM=1 significantly degrades performance:

*** Symbolic factorization = 0.122878

*** Numerical factorization = 0.411002

*** Solution = 8.86472

*** Symbolic factorization = 0.0150951

*** Numerical factorization = 0.403717

*** Solution = 8.87216

*** Symbolic factorization = 0.0146924

*** Numerical factorization = 0.3977

*** Solution = 8.87921

*** Symbolic factorization = 0.0148401

*** Numerical factorization = 0.397932

*** Solution = 8.83029

Making 7.6 Gb garbage in memory

*** Symbolic factorization = 0.190399

*** Numerical factorization = 0.479833

*** Solution = 8.83713

*** Symbolic factorization = 0.169229

*** Numerical factorization = 0.5047

*** Solution = 8.83924

*** Symbolic factorization = 0.176403

*** Numerical factorization = 0.482541

*** Solution = 8.83819

*** Symbolic factorization = 0.179341

*** Numerical factorization = 0.532943

*** Solution = 8.86166

Regards,

Maxim

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the experiment!

If we saw stable (but ofc slower) times before/after garbage allocation with disabled fast memory manager, it would be a great hint for us. Alas, as I see, after making garbage allocations the times go up as well so we can't be sure that fast memory manager affects the original issue.

Thanks for trying.

Best,

Kirill

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello!

Are there any updates on the issue?

In the meantime we redefined MKL's pointers i_malloc, i_calloc, i_realloc and i_free with our own memory pool allocation functions. After that the problems seems to disappear. But we consider it as a temporary solution.

Regards,

Maksim

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page