Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2467 Discussions

std::sort(std::execution::par_unseq, ...) has a memory leak on Linux...

Victor_D_
Novice
1,139 Views

Linux standard C++ parallel algorithms use Intel TBB implementation, from what I found researching on the Web. This implementation seems to have a significant memory leak, as demonstrated by the following code ("top" command in another window can be used to watch %MEM grow continuously as iterations progress):

#include <iostream>
#include <algorithm>
#include <chrono>
#include <random>
#include <ratio>
#include <vector>
#include <execution>
 
using std::random_device;
using std::vector;
using std::chrono::duration;
using std::chrono::duration_cast;
using std::chrono::high_resolution_clock;
using std::milli;
using std::random_device;
using std::sort;
using std::vector;
 
static const int iterationCount = 100;
 
static void print_results(const char* const tag, const vector<unsigned>& sorted, high_resolution_clock::time_point startTime, high_resolution_clock::time_point endTime)
{
printf("%s: Lowest: %u Highest: %u Time: %fms\n", tag, sorted.front(), sorted.back(),
duration_cast<duration<double, milli>>(endTime - startTime).count());
}
 
static int ParallelStdCppExample(vector<unsigned>& uints, bool stable = false)
{
vector<unsigned> sorted(uints);
for (int i = 0; i < iterationCount; ++i)
{
const auto startTime = high_resolution_clock::now();
// same sort call as above, but with par_unseq:
if (!stable)
sort(std::execution::par_unseq, sorted.begin(), sorted.end());
else
stable_sort(std::execution::par_unseq, sorted.begin(), sorted.end());
const auto endTime = high_resolution_clock::now();
// in our output, note that these are the parallel results:
print_results("Parallel", sorted, startTime, endTime);
}
 
return 0;
}
 
int main()
{
// Test configuration options
bool UseStableStdSort = false;
 
// Provide the same input random array of doubles to all sorting algorithms
const size_t testSize = 2'000'000'000;
//random_device rd;
std::mt19937_64 dist(1234);
 
// generate some random unsigned integers:
printf("\nTesting with %zu random unsigned integers...\n\n", testSize);
vector<unsigned> uints(testSize);
for (auto& d : uints) {
//d = static_cast<unsigned>(rd());
d = static_cast<unsigned>(dist());   // way faster on Linux
}
// Example of C++17 Standard C++ Parallel Sorting
ParallelStdCppExample(uints, UseStableStdSort);
 
return 0;
}

 

This implementation has been tested on Ubuntu 22.04 and on WSL (Ubuntu also), both showing memory leaks. After a certain number of iterations on a Windows laptop with 64 GBytes of memory, the application gets killed running in WSL (which gets 32 GBytes). To build it:
g++ StdParallelSortMemoryLeakDemo.cpp -ltbb -std=c++20 -O3 -o ParallelAlgorithms

Could you possibly check this implementation out (if it's Intel's) and fix this memory leak?

Thank you,

-Victor

0 Kudos
7 Replies
Victor_D_
Novice
1,077 Views

Sadly, std::stable_sort TBB implementation also leaks memory and crashes with Linux killing the process with oom (out of memory) message in dmesg:

[ 988.296006] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=ParallelAlgorit,pid=4494,uid=1000
[ 988.296063] Out of memory: Killed process 4494 (ParallelAlgorit) total-vm:41930760kB, anon-rss:31566324kB, file-rss:92kB, shmem-rss:0kB, UID:1000 pgtables:79488kB oom_score_adj:0

On Windows Microsoft implementation of std::sort and std::stable_sort do not have a memory leak issue.

-Victor

0 Kudos
Mark_L_Intel
Moderator
1,056 Views

@Victor_D_,  Thank you for posting. I would need to reproduce this issue first. A couple of side notes: I assume you are using g++ exclusively; and have you tried oneDPL? 

Victor_D_
Novice
1,011 Views

Good suggestion!
It was simple to switch the implementation to 

stable_sort(oneapi::dpl::execution::par_unseq, sorted.begin(), sorted.end());

 

on Windows using VisualStudio 2022 compiler, which showed the memory leak problem also.

Victor_D__0-1710955503017.png

Switching to Intel Compiler (OneAPI DPC++/C++) does not fix the memory leak:

Victor_D__1-1710955740169.png

sort(oneapi::dpl::execution::par_unseq, sorted.begin(), sorted.end());

also leaks memory. Each stair step is one execution of sort() or stable_sort() function.

-Victor

 

0 Kudos
Victor_D_
Novice
941 Views

Another parallel algorithms is also leaking memory:

merge(oneapi::dpl::execution::par, data_int_src_0.begin(), data_int_src_0.end(), data_int_src_1.begin(), data_int_src_1.end(), data_int_dst.begin());

and

merge(oneapi::dpl::execution::par_unseq, data_int_src_0.begin(), data_int_src_0.end(), data_int_src_1.begin(), data_int_src_1.end(), data_int_dst.begin());

 

This repo has been setup to test performance of many Parallel STL algorithms:

https://github.com/DragonSpit/ParallelSTL

 

number_of_tests on line 1868 can be increased to 100 or larger to show memory leak in Task Manager in Windows - Memory usage increases with test iterations, while for algorithms that don't leak memory, memory usage stays flatly horizontal.

 

0 Kudos
Mark_L_Intel
Moderator
827 Views

Hello @Victor_D_ , I filed an internal ticket and will keep you posted on the investigation by our team. Thank you for posting at oneTBB Community Forum!  

0 Kudos
Victor_D_
Novice
763 Views

Glad to help! Looking forward to the fix. Hopefully, the team will test all of the Parallel algorithms for memory leaks, as there seem to be more then one.

0 Kudos
Victor_D_
Novice
89 Views

Any updates by any chance? Any luck fixing this issue?

0 Kudos
Reply