Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2467 Discussions

std::sort(std::execution::par_unseq, ...) has a memory leak on Linux...

Victor_D_
Novice
1,433 Views

Linux standard C++ parallel algorithms use Intel TBB implementation, from what I found researching on the Web. This implementation seems to have a significant memory leak, as demonstrated by the following code ("top" command in another window can be used to watch %MEM grow continuously as iterations progress):

#include <iostream>
#include <algorithm>
#include <chrono>
#include <random>
#include <ratio>
#include <vector>
#include <execution>
 
using std::random_device;
using std::vector;
using std::chrono::duration;
using std::chrono::duration_cast;
using std::chrono::high_resolution_clock;
using std::milli;
using std::random_device;
using std::sort;
using std::vector;
 
static const int iterationCount = 100;
 
static void print_results(const char* const tag, const vector<unsigned>& sorted, high_resolution_clock::time_point startTime, high_resolution_clock::time_point endTime)
{
printf("%s: Lowest: %u Highest: %u Time: %fms\n", tag, sorted.front(), sorted.back(),
duration_cast<duration<double, milli>>(endTime - startTime).count());
}
 
static int ParallelStdCppExample(vector<unsigned>& uints, bool stable = false)
{
vector<unsigned> sorted(uints);
for (int i = 0; i < iterationCount; ++i)
{
const auto startTime = high_resolution_clock::now();
// same sort call as above, but with par_unseq:
if (!stable)
sort(std::execution::par_unseq, sorted.begin(), sorted.end());
else
stable_sort(std::execution::par_unseq, sorted.begin(), sorted.end());
const auto endTime = high_resolution_clock::now();
// in our output, note that these are the parallel results:
print_results("Parallel", sorted, startTime, endTime);
}
 
return 0;
}
 
int main()
{
// Test configuration options
bool UseStableStdSort = false;
 
// Provide the same input random array of doubles to all sorting algorithms
const size_t testSize = 2'000'000'000;
//random_device rd;
std::mt19937_64 dist(1234);
 
// generate some random unsigned integers:
printf("\nTesting with %zu random unsigned integers...\n\n", testSize);
vector<unsigned> uints(testSize);
for (auto& d : uints) {
//d = static_cast<unsigned>(rd());
d = static_cast<unsigned>(dist());   // way faster on Linux
}
// Example of C++17 Standard C++ Parallel Sorting
ParallelStdCppExample(uints, UseStableStdSort);
 
return 0;
}

 

This implementation has been tested on Ubuntu 22.04 and on WSL (Ubuntu also), both showing memory leaks. After a certain number of iterations on a Windows laptop with 64 GBytes of memory, the application gets killed running in WSL (which gets 32 GBytes). To build it:
g++ StdParallelSortMemoryLeakDemo.cpp -ltbb -std=c++20 -O3 -o ParallelAlgorithms

Could you possibly check this implementation out (if it's Intel's) and fix this memory leak?

Thank you,

-Victor

0 Kudos
10 Replies
Victor_D_
Novice
1,371 Views

Sadly, std::stable_sort TBB implementation also leaks memory and crashes with Linux killing the process with oom (out of memory) message in dmesg:

[ 988.296006] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=ParallelAlgorit,pid=4494,uid=1000
[ 988.296063] Out of memory: Killed process 4494 (ParallelAlgorit) total-vm:41930760kB, anon-rss:31566324kB, file-rss:92kB, shmem-rss:0kB, UID:1000 pgtables:79488kB oom_score_adj:0

On Windows Microsoft implementation of std::sort and std::stable_sort do not have a memory leak issue.

-Victor

0 Kudos
Mark_L_Intel
Moderator
1,350 Views

@Victor_D_,  Thank you for posting. I would need to reproduce this issue first. A couple of side notes: I assume you are using g++ exclusively; and have you tried oneDPL? 

Victor_D_
Novice
1,305 Views

Good suggestion!
It was simple to switch the implementation to 

stable_sort(oneapi::dpl::execution::par_unseq, sorted.begin(), sorted.end());

 

on Windows using VisualStudio 2022 compiler, which showed the memory leak problem also.

Victor_D__0-1710955503017.png

Switching to Intel Compiler (OneAPI DPC++/C++) does not fix the memory leak:

Victor_D__1-1710955740169.png

sort(oneapi::dpl::execution::par_unseq, sorted.begin(), sorted.end());

also leaks memory. Each stair step is one execution of sort() or stable_sort() function.

-Victor

 

0 Kudos
Victor_D_
Novice
1,235 Views

Another parallel algorithms is also leaking memory:

merge(oneapi::dpl::execution::par, data_int_src_0.begin(), data_int_src_0.end(), data_int_src_1.begin(), data_int_src_1.end(), data_int_dst.begin());

and

merge(oneapi::dpl::execution::par_unseq, data_int_src_0.begin(), data_int_src_0.end(), data_int_src_1.begin(), data_int_src_1.end(), data_int_dst.begin());

 

This repo has been setup to test performance of many Parallel STL algorithms:

https://github.com/DragonSpit/ParallelSTL

 

number_of_tests on line 1868 can be increased to 100 or larger to show memory leak in Task Manager in Windows - Memory usage increases with test iterations, while for algorithms that don't leak memory, memory usage stays flatly horizontal.

 

0 Kudos
Mark_L_Intel
Moderator
1,121 Views

Hello @Victor_D_ , I filed an internal ticket and will keep you posted on the investigation by our team. Thank you for posting at oneTBB Community Forum!  

0 Kudos
Victor_D_
Novice
1,057 Views

Glad to help! Looking forward to the fix. Hopefully, the team will test all of the Parallel algorithms for memory leaks, as there seem to be more then one.

0 Kudos
Victor_D_
Novice
383 Views

Any updates by any chance? Any luck fixing this issue?

0 Kudos
Mark_L_Intel
Moderator
240 Views

@Victor_D_ ,

 

A response from oneDPL developer:

"

I've had a look at the oneDPL code, the implementation stable_sort and merge... I don't see any suspicious places where the allocated memory is not deallocated. The implementation with TBB backend uses tbb::tbb_allocator, and calls "allocate" and "deallocate" methods in RAII style.
Also, I tried to reproduce the memory leaks issue with "https://github.com/DragonSpit/ParallelSTL/blob/master/src/main.cpp"  and "stable_sort_benchmark(        array_size, number_of_tests);" in particular.. And I could not reproduce the mentioned issue.
Please have a look at the following output:

 

icpx   -fopenmp-simd -DTBB_USE_GLIBCXX_VERSION=110400  -D_PSTL_TEST_SUCCESSFUL_KEYWORD=1 -DONEDPL_USE_TBB_BACKEND=1 -I/<home_dir>/oneDPL/make/../include -I<home_dir>/oneDPL/make/../test -I<home_dir>/oneDPL/make/../stdlib    -O2      -L. -L<home_dir>/oneDPL/make/../make -ltbb    <home_dir>/oneDPL/make/../test/parallel_api/ranges/main.pass.cpp -omain.pass.exe
>> ./main.pass.exe
Serial std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 10675.140632ms
Serial std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 10336.344587ms
Serial std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 10373.401209ms
Serial std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 10268.889961ms
......
Parallel SIMD std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 169.772027ms
Parallel SIMD std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 170.682111ms
Parallel SIMD std::stable_sort: size = 100000000  Lowest: 290899232 Highest: 435078078 Time: 169.344242ms
[06:00]sdp@a4bf0192d193:<home_dir>/make
 >>

 

Could you please provide more details here?  do you use some another compiler options? Could you please provide exact command line? (compile version, the all compiler flags and other keys)

"

 

0 Kudos
Victor_D_
Novice
222 Views

Could you try to reproduce the issue with the steps provided above, such as using the code in https://github.com/DragonSpit/ParallelSTL

and changing the number_of_tests on line 1868 can be increased to 100 or larger to show memory leak in Task Manager in Windows. This repo also works on Linux, with very simple code that shows Parallel STL support by multiple compilers, including Intel's.

-Victor

0 Kudos
Victor_D_
Novice
51 Views

Here is another posting in GitHub today on this issue:

Not sure if I should open another issue, but it seems that at least with GCC12, there are memory leaks when using PSTL with TBB backend (elalish/manifold#787). It works fine with clang using libc++. Not sure if gcc 13 works, not yet checked.

from: 

pca006132 <notifications@github.com> 

0 Kudos
Reply