Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2481 Discussions

std::sort(std::execution::par_unseq, ...) has a memory leak on Linux...

Victor_D_
New Contributor I
4,819 Views

Linux standard C++ parallel algorithms use Intel TBB implementation, from what I found researching on the Web. This implementation seems to have a significant memory leak, as demonstrated by the following code ("top" command in another window can be used to watch %MEM grow continuously as iterations progress):

#include <iostream>
#include <algorithm>
#include <chrono>
#include <random>
#include <ratio>
#include <vector>
#include <execution>
 
using std::random_device;
using std::vector;
using std::chrono::duration;
using std::chrono::duration_cast;
using std::chrono::high_resolution_clock;
using std::milli;
using std::random_device;
using std::sort;
using std::vector;
 
static const int iterationCount = 100;
 
static void print_results(const char* const tag, const vector<unsigned>& sorted, high_resolution_clock::time_point startTime, high_resolution_clock::time_point endTime)
{
printf("%s: Lowest: %u Highest: %u Time: %fms\n", tag, sorted.front(), sorted.back(),
duration_cast<duration<double, milli>>(endTime - startTime).count());
}
 
static int ParallelStdCppExample(vector<unsigned>& uints, bool stable = false)
{
vector<unsigned> sorted(uints);
for (int i = 0; i < iterationCount; ++i)
{
const auto startTime = high_resolution_clock::now();
// same sort call as above, but with par_unseq:
if (!stable)
sort(std::execution::par_unseq, sorted.begin(), sorted.end());
else
stable_sort(std::execution::par_unseq, sorted.begin(), sorted.end());
const auto endTime = high_resolution_clock::now();
// in our output, note that these are the parallel results:
print_results("Parallel", sorted, startTime, endTime);
}
 
return 0;
}
 
int main()
{
// Test configuration options
bool UseStableStdSort = false;
 
// Provide the same input random array of doubles to all sorting algorithms
const size_t testSize = 2'000'000'000;
//random_device rd;
std::mt19937_64 dist(1234);
 
// generate some random unsigned integers:
printf("\nTesting with %zu random unsigned integers...\n\n", testSize);
vector<unsigned> uints(testSize);
for (auto& d : uints) {
//d = static_cast<unsigned>(rd());
d = static_cast<unsigned>(dist());   // way faster on Linux
}
// Example of C++17 Standard C++ Parallel Sorting
ParallelStdCppExample(uints, UseStableStdSort);
 
return 0;
}

 

This implementation has been tested on Ubuntu 22.04 and on WSL (Ubuntu also), both showing memory leaks. After a certain number of iterations on a Windows laptop with 64 GBytes of memory, the application gets killed running in WSL (which gets 32 GBytes). To build it:
g++ StdParallelSortMemoryLeakDemo.cpp -ltbb -std=c++20 -O3 -o ParallelAlgorithms

Could you possibly check this implementation out (if it's Intel's) and fix this memory leak?

Thank you,

-Victor

17 Replies
Victor_D_
New Contributor I
4,757 Views

Sadly, std::stable_sort TBB implementation also leaks memory and crashes with Linux killing the process with oom (out of memory) message in dmesg:

[ 988.296006] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=ParallelAlgorit,pid=4494,uid=1000
[ 988.296063] Out of memory: Killed process 4494 (ParallelAlgorit) total-vm:41930760kB, anon-rss:31566324kB, file-rss:92kB, shmem-rss:0kB, UID:1000 pgtables:79488kB oom_score_adj:0

On Windows Microsoft implementation of std::sort and std::stable_sort do not have a memory leak issue.

-Victor

Mark_L_Intel
Moderator
4,736 Views

@Victor_D_,  Thank you for posting. I would need to reproduce this issue first. A couple of side notes: I assume you are using g++ exclusively; and have you tried oneDPL? 

Victor_D_
New Contributor I
4,691 Views

Good suggestion!
It was simple to switch the implementation to 

stable_sort(oneapi::dpl::execution::par_unseq, sorted.begin(), sorted.end());

 

on Windows using VisualStudio 2022 compiler, which showed the memory leak problem also.

Victor_D__0-1710955503017.png

Switching to Intel Compiler (OneAPI DPC++/C++) does not fix the memory leak:

Victor_D__1-1710955740169.png

sort(oneapi::dpl::execution::par_unseq, sorted.begin(), sorted.end());

also leaks memory. Each stair step is one execution of sort() or stable_sort() function.

-Victor

 

Victor_D_
New Contributor I
4,621 Views

Another parallel algorithms is also leaking memory:

merge(oneapi::dpl::execution::par, data_int_src_0.begin(), data_int_src_0.end(), data_int_src_1.begin(), data_int_src_1.end(), data_int_dst.begin());

and

merge(oneapi::dpl::execution::par_unseq, data_int_src_0.begin(), data_int_src_0.end(), data_int_src_1.begin(), data_int_src_1.end(), data_int_dst.begin());

 

This repo has been setup to test performance of many Parallel STL algorithms:

https://github.com/DragonSpit/ParallelSTL

 

number_of_tests on line 1868 can be increased to 100 or larger to show memory leak in Task Manager in Windows - Memory usage increases with test iterations, while for algorithms that don't leak memory, memory usage stays flatly horizontal.

 

Mark_L_Intel
Moderator
4,507 Views

Hello @Victor_D_ , I filed an internal ticket and will keep you posted on the investigation by our team. Thank you for posting at oneTBB Community Forum!  

0 Kudos
Victor_D_
New Contributor I
4,443 Views

Glad to help! Looking forward to the fix. Hopefully, the team will test all of the Parallel algorithms for memory leaks, as there seem to be more then one.

0 Kudos
Victor_D_
New Contributor I
3,769 Views

Any updates by any chance? Any luck fixing this issue?

Mark_L_Intel
Moderator
3,626 Views

@Victor_D_ ,

 

A response from oneDPL developer:

"

I've had a look at the oneDPL code, the implementation stable_sort and merge... I don't see any suspicious places where the allocated memory is not deallocated. The implementation with TBB backend uses tbb::tbb_allocator, and calls "allocate" and "deallocate" methods in RAII style.
Also, I tried to reproduce the memory leaks issue with "https://github.com/DragonSpit/ParallelSTL/blob/master/src/main.cpp"  and "stable_sort_benchmark(        array_size, number_of_tests);" in particular.. And I could not reproduce the mentioned issue.
Please have a look at the following output:

 

icpx   -fopenmp-simd -DTBB_USE_GLIBCXX_VERSION=110400  -D_PSTL_TEST_SUCCESSFUL_KEYWORD=1 -DONEDPL_USE_TBB_BACKEND=1 -I/<home_dir>/oneDPL/make/../include -I<home_dir>/oneDPL/make/../test -I<home_dir>/oneDPL/make/../stdlib    -O2      -L. -L<home_dir>/oneDPL/make/../make -ltbb    <home_dir>/oneDPL/make/../test/parallel_api/ranges/main.pass.cpp -omain.pass.exe
>> ./main.pass.exe
Serial std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 10675.140632ms
Serial std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 10336.344587ms
Serial std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 10373.401209ms
Serial std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 10268.889961ms
......
Parallel SIMD std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 169.772027ms
Parallel SIMD std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 170.682111ms
Parallel SIMD std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 169.344242ms
[06:00]sdp@a4bf0192d193:<home_dir>/make
 >>

 

Could you please provide more details here?  do you use some another compiler options? Could you please provide exact command line? (compile version, the all compiler flags and other keys)

"

 

0 Kudos
Victor_D_
New Contributor I
3,608 Views

Could you try to reproduce the issue with the steps provided above, such as using the code in https://github.com/DragonSpit/ParallelSTL

and changing the number_of_tests on line 1868 can be increased to 100 or larger to show memory leak in Task Manager in Windows. This repo also works on Linux, with very simple code that shows Parallel STL support by multiple compilers, including Intel's.

-Victor

Victor_D_
New Contributor I
3,437 Views

Here is another posting in GitHub today on this issue:

Not sure if I should open another issue, but it seems that at least with GCC12, there are memory leaks when using PSTL with TBB backend (elalish/manifold#787). It works fine with clang using libc++. Not sure if gcc 13 works, not yet checked.

from: 

pca006132 <notifications@github.com> 

Mark_L_Intel
Moderator
3,198 Views

Hello @Victor_D_ ,

 

  1. I sent you a private communication -- our oneDPL developer would like to meet with you regarding how to reproduce the memory leaks with the oneDPL examples you provided. Please also see item 3 below.   
  2. Regarding GCC memory leaks with TBB backend. From oneTBB lead developer: "We confirmed that GCC uses old TBB for PSTL implementation. old TBB is not supported and thus won't be fixed."
  3. from oneDPL developer:   "I downloaded the example https://github.com/DragonSpit/ParallelSTL
    Set size_t number_of_tests = 100;
    I used PSTL source
        1)from here C:\Program Files (x86)\Intel\oneAPI\dpl\latest\windows\include
        2) from here C:\Program Files (x86)\Intel\oneAPI\dpl\2022.2.0\windows\include
        3) From the main of GitHub repo (https://github.com/oneapi-src/oneDPL)
    Tried to run with different backends (TBB, openMP and serial) - could not reproduce the issue. "image-2024-05-07-14-00-27-659.png
Victor_D_
New Contributor I
3,187 Views

One of the paths in item #3 is definitely different from mine, where mine is:

C:\Program Files (x86)\Intel\oneAPI\2024.0\include

Maybe the confusion is that I'm using PSTL sources, whereas I'm not. These Parallel standard algorithm implementation std::sort are the ones from OneAPI itself. There are no additional repositories involved. The source code calls:

sort(oneapi::dpl::execution::par, data_copy.begin(), data_copy.end());

sort(oneapi::dpl::execution::par_unseq, data_copy.begin(), data_copy.end());

which are the ones leaking memory.

Victor_D_
New Contributor I
3,185 Views

The repository in  https://github.com/DragonSpit/ParallelSTL also includes a VisualStudio 2022 project/solution, which shows all of the settings and paths used under Windows.

Mark_L_Intel
Moderator
2,984 Views

Hello @Victor_D_ ,

  The issue was reproduced.  The fix for oneDPL is in this pull request: [oneDPL][tbb] + memory leaks fix (sort/stable sort, tbb backend) #1589. Thank you for your post!

 

Regards,

Mark. 

0 Kudos
Victor_D_
New Contributor I
2,957 Views

Hello Mark,

Yay! So glad!!! Awesome to have a fix for it too!

-Victor

0 Kudos
Victor_D_
New Contributor I
403 Views

Hello Mark,

I decided to try 2025 version of oneAPI to see if the memory leak fix for this sort() and stable_sort() has been fixed for use in Linux and Windows under VisualStudio 2022.

Sadly, the memory leak for both of these algorithms is still there on Windows.

Sadly, on Linux both algorithms now crash.

-Victor

0 Kudos
Victor_D_
New Contributor I
243 Views

My mistake on the Windows Visual Studio version - the 2025 version of oneAPI fixed the memory leak.

However, the latest Linux version crashes.

-Victor

0 Kudos
Reply