- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Linux standard C++ parallel algorithms use Intel TBB implementation, from what I found researching on the Web. This implementation seems to have a significant memory leak, as demonstrated by the following code ("top" command in another window can be used to watch %MEM grow continuously as iterations progress):
This implementation has been tested on Ubuntu 22.04 and on WSL (Ubuntu also), both showing memory leaks. After a certain number of iterations on a Windows laptop with 64 GBytes of memory, the application gets killed running in WSL (which gets 32 GBytes). To build it:
g++ StdParallelSortMemoryLeakDemo.cpp -ltbb -std=c++20 -O3 -o ParallelAlgorithms
Could you possibly check this implementation out (if it's Intel's) and fix this memory leak?
Thank you,
-Victor
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sadly, std::stable_sort TBB implementation also leaks memory and crashes with Linux killing the process with oom (out of memory) message in dmesg:
[ 988.296006] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=ParallelAlgorit,pid=4494,uid=1000
[ 988.296063] Out of memory: Killed process 4494 (ParallelAlgorit) total-vm:41930760kB, anon-rss:31566324kB, file-rss:92kB, shmem-rss:0kB, UID:1000 pgtables:79488kB oom_score_adj:0
On Windows Microsoft implementation of std::sort and std::stable_sort do not have a memory leak issue.
-Victor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Victor_D_, Thank you for posting. I would need to reproduce this issue first. A couple of side notes: I assume you are using g++ exclusively; and have you tried oneDPL?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good suggestion!
It was simple to switch the implementation to
stable_sort(oneapi::dpl::execution::par_unseq, sorted.begin(), sorted.end());
on Windows using VisualStudio 2022 compiler, which showed the memory leak problem also.
Switching to Intel Compiler (OneAPI DPC++/C++) does not fix the memory leak:
sort(oneapi::dpl::execution::par_unseq, sorted.begin(), sorted.end());
also leaks memory. Each stair step is one execution of sort() or stable_sort() function.
-Victor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Another parallel algorithms is also leaking memory:
merge(oneapi::dpl::execution::par, data_int_src_0.begin(), data_int_src_0.end(), data_int_src_1.begin(), data_int_src_1.end(), data_int_dst.begin());
and
merge(oneapi::dpl::execution::par_unseq, data_int_src_0.begin(), data_int_src_0.end(), data_int_src_1.begin(), data_int_src_1.end(), data_int_dst.begin());
This repo has been setup to test performance of many Parallel STL algorithms:
https://github.com/DragonSpit/ParallelSTL
number_of_tests on line 1868 can be increased to 100 or larger to show memory leak in Task Manager in Windows - Memory usage increases with test iterations, while for algorithms that don't leak memory, memory usage stays flatly horizontal.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Victor_D_ , I filed an internal ticket and will keep you posted on the investigation by our team. Thank you for posting at oneTBB Community Forum!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Glad to help! Looking forward to the fix. Hopefully, the team will test all of the Parallel algorithms for memory leaks, as there seem to be more then one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any updates by any chance? Any luck fixing this issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A response from oneDPL developer:
"
I've had a look at the oneDPL code, the implementation stable_sort and merge... I don't see any suspicious places where the allocated memory is not deallocated. The implementation with TBB backend uses tbb::tbb_allocator, and calls "allocate" and "deallocate" methods in RAII style.
Also, I tried to reproduce the memory leaks issue with "https://github.com/DragonSpit/ParallelSTL/blob/master/src/main.cpp" and "stable_sort_benchmark( array_size, number_of_tests);" in particular.. And I could not reproduce the mentioned issue.
Please have a look at the following output:
icpx -fopenmp-simd -DTBB_USE_GLIBCXX_VERSION=110400 -D_PSTL_TEST_SUCCESSFUL_KEYWORD=1 -DONEDPL_USE_TBB_BACKEND=1 -I/<home_dir>/oneDPL/make/../include -I<home_dir>/oneDPL/make/../test -I<home_dir>/oneDPL/make/../stdlib -O2 -L. -L<home_dir>/oneDPL/make/../make -ltbb <home_dir>/oneDPL/make/../test/parallel_api/ranges/main.pass.cpp -omain.pass.exe
>> ./main.pass.exe
Serial std::stable_sort: size = 100000000 Lowest: 290899232 Highest: 435078078 Time: 10675.140632ms
Serial std::stable_sort: size = 100000000 Lowest: 290899232 Highest: 435078078 Time: 10336.344587ms
Serial std::stable_sort: size = 100000000 Lowest: 290899232 Highest: 435078078 Time: 10373.401209ms
Serial std::stable_sort: size = 100000000 Lowest: 290899232 Highest: 435078078 Time: 10268.889961ms
......
Parallel SIMD std::stable_sort: size = 100000000 Lowest: 290899232 Highest: 435078078 Time: 169.772027ms
Parallel SIMD std::stable_sort: size = 100000000 Lowest: 290899232 Highest: 435078078 Time: 170.682111ms
Parallel SIMD std::stable_sort: size = 100000000 Lowest: 290899232 Highest: 435078078 Time: 169.344242ms
[06:00]sdp@a4bf0192d193:<home_dir>/make
>>
Could you please provide more details here? do you use some another compiler options? Could you please provide exact command line? (compile version, the all compiler flags and other keys)
"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you try to reproduce the issue with the steps provided above, such as using the code in https://github.com/DragonSpit/ParallelSTL
and changing the number_of_tests on line 1868 can be increased to 100 or larger to show memory leak in Task Manager in Windows. This repo also works on Linux, with very simple code that shows Parallel STL support by multiple compilers, including Intel's.
-Victor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is another posting in GitHub today on this issue:
Not sure if I should open another issue, but it seems that at least with GCC12, there are memory leaks when using PSTL with TBB backend (elalish/manifold#787). It works fine with clang using libc++. Not sure if gcc 13 works, not yet checked.
from:
pca006132 <notifications@github.com>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Victor_D_ ,
- I sent you a private communication -- our oneDPL developer would like to meet with you regarding how to reproduce the memory leaks with the oneDPL examples you provided. Please also see item 3 below.
- Regarding GCC memory leaks with TBB backend. From oneTBB lead developer: "We confirmed that GCC uses old TBB for PSTL implementation. old TBB is not supported and thus won't be fixed."
- from oneDPL developer: "I downloaded the example https://github.com/DragonSpit/ParallelSTL
Set size_t number_of_tests = 100;
I used PSTL source
1)from here C:\Program Files (x86)\Intel\oneAPI\dpl\latest\windows\include
2) from here C:\Program Files (x86)\Intel\oneAPI\dpl\2022.2.0\windows\include
3) From the main of GitHub repo (https://github.com/oneapi-src/oneDPL)
Tried to run with different backends (TBB, openMP and serial) - could not reproduce the issue. "
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One of the paths in item #3 is definitely different from mine, where mine is:
C:\Program Files (x86)\Intel\oneAPI\2024.0\include
Maybe the confusion is that I'm using PSTL sources, whereas I'm not. These Parallel standard algorithm implementation std::sort are the ones from OneAPI itself. There are no additional repositories involved. The source code calls:
sort(oneapi::dpl::execution::par, data_copy.begin(), data_copy.end());
sort(oneapi::dpl::execution::par_unseq, data_copy.begin(), data_copy.end());
which are the ones leaking memory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The repository in https://github.com/DragonSpit/ParallelSTL also includes a VisualStudio 2022 project/solution, which shows all of the settings and paths used under Windows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Victor_D_ ,
The issue was reproduced. The fix for oneDPL is in this pull request: [oneDPL][tbb] + memory leaks fix (sort/stable sort, tbb backend) #1589. Thank you for your post!
Regards,
Mark.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Mark,
Yay! So glad!!! Awesome to have a fix for it too!
-Victor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Mark,
I decided to try 2025 version of oneAPI to see if the memory leak fix for this sort() and stable_sort() has been fixed for use in Linux and Windows under VisualStudio 2022.
Sadly, the memory leak for both of these algorithms is still there on Windows.
Sadly, on Linux both algorithms now crash.
-Victor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My mistake on the Windows Visual Studio version - the 2025 version of oneAPI fixed the memory leak.
However, the latest Linux version crashes.
-Victor
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page