Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
New Contributor I
136 Views

Issue with MKL Data Fitting splines memory usage

Hello everyone,

We recently transitioned our code to utilise the MKL data fitting spline and have now noticed a rather large issue with memory usage.

In our largest examples we can allocate in excess of 1e6 splines using the MKL spline library and we are seeing an additional overhead of ~100GB of RAM from just the MKL splines.

Previously we used our own code for the natural cubic spline and have compared directly between implementations to understand and isolate the memory increase. It seems that the additional memory is allocated within dfdNewTask1D.

The splines themselves are nothing to complicated or large, typically consist of ~27 knots and use the natural cubic spline method.

I have tried to disable the fast memory management, and to free unused memory for MKL as discussed at avoiding-memory-leaks-in-intel-mkl.html. These options has no impact on the overall memory usage.

Is there anything that can be done to reduce the memory overhead of the MKL data fitting splines?

Is this memory usage expected?

Thanks,

Ewan

0 Kudos
11 Replies
Highlighted
Moderator
126 Views

Ewan, do you use something from MKL beyond of splines?


0 Kudos
Highlighted
New Contributor I
121 Views

Hi Gennady,

Yes, MKL is used elsewhere in the tool. For example, the LAPACK library.

Within the cubic spline code, I restrict the MKL include to just "mkl_df.h"

Ewan

0 Kudos
Highlighted
Employee
118 Views

Hi Ewan,

Glad to see new topics from you!

Could you please provide a little bit more details regarding next points:

>> In our largest examples we can allocate in excess of 1e6 splines using the MKL spline library and we are seeing an additional overhead of ~100GB of RAM from just the MKL splines.

Do you create a new Datafitting task for each spline? And do you use dfDeleteTask() routine to free the used memory when the task is not needed anymore?

Also it would be great to know Datafitting task parameters like “nx”, “ny”, “yhint”.

The answers can help me to understand/reproduce observed problem.

Best regards,
Pavel

0 Kudos
Highlighted
New Contributor I
111 Views

Hi Pavel,

I'm glad you are happy to see me, I wasn't sure how keen you would be to hear from me again  

Ok, let me answer your questions and give you some detail so that you can reproduce:

  • Yes, a new datafitting task is created for every spline.
  • Yes, we do correctly deallocate splines when they are no longer used. However, during our computation cycle all >1e6 splines will be required. Only at the end of computation do we deallocate and clean up the splines.
  • Maybe worth noting that the memory consumption doesn't appear to be a memory leak, from some analysis with Heaptrack (heap only analyser).
  • The large bulk of splines are natural cubic spines with BCs of f''(x) == 0.0.
    • Spline type DF_PP_NATURAL with DF_PP_CUBIC order.
    • Spline BC:
      • DF_BC_2ND_LEFT_DER == 0.0
      • DF_BC_2ND_RIGHT_DER == 0.0 
    • No internal conditions, with DF_NO_HINT.
    • Using DF_NON_UNIFORM_PARTITION for x-values hint.
    • Using DF_NO_HINT for y-values hint.
    • Using DF_NO_HINT for spline co-efficients.
    • Typically these splines have something like 27 knots (nx=ny=~27 points).

If you have any more questions then let me know.

Ewan

0 Kudos
Highlighted
Employee
81 Views

Hi Ewan,

thank you for the provided details.

Gennady and me are checking that on our side. 

Preliminary I could say about 16GB are used for data fitting tasks in case of 1e6 splines.

Some temporary memory is also allocated during interpolation routine call but it is freed at the end of the call.

If all of the 1e6 splines use the same x-coordinates with different function values I could suggest to use single Datafitting task with the vector-valued function.

But anyway we continue investigating it.

Best regards,
Pavel 

0 Kudos
Highlighted
New Contributor I
77 Views

Ok, so you don't seem to be seeing nearly the same issue with memory usage.

I probably should've mentioned above, but I am using MKL 2019_U5 . 

What version of MKL are you guys testing with?

Ewan

0 Kudos
Highlighted
Moderator
72 Views

We tested the latest version 2020 u4.


0 Kudos
Highlighted
Moderator
53 Views

Ewan,

we tried to emulate the case you reported and measure the size of memory consumed by mkl. Please take a look at the example of the code attached.

Here are what we see ( mkl 2020 u4, openmp threaded version, lp64 mode, RH7):

$ ./a.out 1000000 <--- the input number of splines,

Number of Splines == 1000000

Peak memory allocated by Intel(R) MKL allocator :     17208004048 bytes.

it means that MKL allocates itself about 17 Gb of memory only.

You may give us the reproducer to see the 100 Gb memory consumed by mkl, in the case your usage model is different.

-Gennady



0 Kudos
Highlighted
Moderator
52 Views

the test we built follow with your instruction is attached

0 Kudos
Highlighted
New Contributor I
44 Views

Thanks, I will have a look at the test case and try to repeat on my side.

The only thing I can immediately see differently is that I only use the serial MKL library. In our software we make heavy use of TBB parallelism and therefore try to stick with serial MKL functions.

Ewan

0 Kudos
Highlighted
New Contributor I
32 Views

Hi,

Just to update as I'm on holiday next week..

Repeated your test case in my build environment and can confirm that I see the same memory usage of 16GB per 1e6 splines.

Have quickly thrown together a test case based on my MKL spline class (wrapper for your library calls), and also see the same memory usage.

So, will run another memory analysis using our full software to get more info on the memory issue. If possible, I will try to get either a reproducer, or at least a better understanding and work backwards from there.

TLDR  Am on holiday next week, will pick this up and figure out what is unique about my test case when I return.

0 Kudos