- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am using a pipeline, and one of the filters does a bunch of STL map finds, erases and inserts. It doesn't explicitly allocate any new heap memory.
Here is the weird thing: for some reason, this filter leads to huge memory usage, enough so that my code cannot run to completion. When I comment out the function that does all the map stuff, everything is fine.
So the question is, is there any kind of funkiness (different memory model or allocator) that happens under the hood with TBB that I should be thinking about?
Thanks,
John
Link Copied
15 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you do not explicitly specify that the map should use one of allocatorsprovided by TBB, and if you do not globally substitute standard memory allocation routines (malloc/free or new/delete) with their analogues provided by TBB, then std::map objects should use regular memory allocation mechanisms.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jind
I am using a pipeline, and one of the filters does a bunch of STL map finds, erases and inserts. It doesn't explicitly allocate any new heap memory.
Here is the weird thing: for some reason, this filter leads to huge memory usage, enough so that my code cannot run to completion. When I comment out the function that does all the map stuff, everything is fine.
So the question is, is there any kind of funkiness (different memory model or allocator) that happens under the hood with TBB that I should be thinking about?
What do you mean by huge memory consumption?
If memory consumption by single-threaded application is X, and you have N threads, then memory consumption of X*N for multi-threaded application is Ok and expected. If memory consumption is far above X*N then it's oddly.
Your run-time/OS memory allocator may do some per-thread caching of memory which may explain increased memory consumption.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've narrowed it down a bit:
My filter class maintains a couple of std::maps, and each call to operator() may result in an insert, erase, or change of some elements in these maps.
These maps are relatively small (say, 5 MB altogether), yet increasing the number of threads intbb::task_scheduler_init from 1 to 2 increases memory usage by 50MB.
Can you please explain in a little more detail (or point me somewhere that can) what expectations regarding per-thread memory allocation I should expect? Surely each thread doesn't maintain a full local copy of the filter object?
What is more worrisome is that the size of the maps stays roughly the same (i.e., the inserts are roughly balanced out by the erases) as tokens are processed, yet memory usage grows with the number of tokens processed.
Quoting - Dmitriy Vyukov
What do you mean by huge memory consumption?
If memory consumption by single-threaded application is X, and you have N threads, then memory consumption of X*N for multi-threaded application is Ok and expected. If memory consumption is far above X*N then it's oddly.
Your run-time/OS memory allocator may do some per-thread caching of memory which may explain increased memory consumption.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay, now it is *really* narrowed down. I can reproduce what I'm seeing using the following minimum processing for each item:
const std::map::iterator p = m.find(item.id());
if(p!=m.end()) {
m.erase(p);
}
m.insert(make_pair(item.id(), item));
If I comment out everything but the last line (i.e., do not erase before inserting), memory usage goes way, way down.
Any ideas? I'm completely stumped.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jind
My filter class maintains a couple of std::maps, and each call to operator() may result in an insert, erase, or change of some elements in these maps.
These maps are relatively small (say, 5 MB altogether), yet increasing the number of threads intbb::task_scheduler_init from 1 to 2 increases memory usage by 50MB.
These maps are relatively small (say, 5 MB altogether), yet increasing the number of threads intbb::task_scheduler_init from 1 to 2 increases memory usage by 50MB.
Tell us more about the filter class. Could there be multiple copies of the filter (per pool thread) playing havoc with the non-thread-safe access of the std:maps?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've also confirmed that the same behavior results when the value in the map is not an Item, but (say) a double or other simple data type. The only difference is that the amount of memory in the "erase-first" case goes down, but it is still significantly higher than the "no-erase-first" case.
Quoting - jind
Okay, now it is *really* narrowed down. I can reproduce what I'm seeing using the following minimum processing for each item:
const std::map::iterator p = m.find(item.id());
if(p!=m.end()) {
m.erase(p);
}
m.insert(make_pair(item.id(), item));
If I comment out everything but the last line (i.e., do not erase before inserting), memory usage goes way, way down.
Any ideas? I'm completely stumped.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The filter class is declared as a serial_in_order filter, and has a private std::map. My (admittedly limited) understanding of filters is that a serial_in_order filter need not worry about thread safety in this case.
However, as a test, I added a mutex to the class and locked it before doing the map erase and update, with the same results...
Quoting - Robert Reed (Intel)
Tell us more about the filter class. Could there be multiple copies of the filter (per pool thread) playing havoc with the non-thread-safe access of the std:maps?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Another tidbit: if I replace the find/erase construct above with the following, memory usage goes up a *lot* more. It seems that the call to erase(), whether or not the item exists, is the source of the problem...
Quoting - jind
m.erase(m.find(item.id()));
m.insert(make_pair(item.id(), item));
m.insert(make_pair(item.id(), item));
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also just tried using concurrent_hash_map in place of std::map, and got similar results. To give an idea, without the erase, the program grew to about 100MB, whereas with it, it was over 250MB. I am literally commenting out just one line to get that difference!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How abouta small but self-contained program that reproduces the problem...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I was working on cutting things down to the simplest reproducing case, I tried replacing the second (map-add) filter with a dummy serial_in_order filter that simply passes the token along. Here is a table showing memory usage for the input filter alone, and the effect of adding the dummy filter for 1, 2, and 8 threads:
N_THREADS | |||
FILTERS | 1 | 2 | 8 |
input | 98 | 105 | 105 |
input dummy | 110 | 445 | 489 |
Not sure if this is relevant, but even though both filters are serial_in_order and the pipeline is running with a max of one token, CPU usage reflects the number of threads when I add the dummy, but is pegged at one when I don't.
Does this suggest anything I could be missing?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jind
Not sure if this is relevant, but even though both filters are serial_in_order and the pipeline is running with a max of one token, CPU usage reflects the number of threads when I add the dummy, but is pegged at one when I don't.
Does this suggest anything I could be missing?
As single serial filter means no parallelism, the pipeline just drains the input in a serial loop.
Your second setup also means no parallelism, but this case is currently not recognized as such. So threads are alive and actively seek for some work. And it seems the pipeline spawns new tasksregularly, preventing worker threads from falling asleep.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why would the second scenario (two serial_in_order filters, only one token in the pipeline at a time) not be recognized as serial? Is it something I am doing?
Quoting - Alexey Kukanov (Intel)
As single serial filter means no parallelism, the pipeline just drains the input in a serial loop.
Your second setup also means no parallelism, but this case is currently not recognized as such. So threads are alive and actively seek for some work. And it seems the pipeline spawns new tasksregularly, preventing worker threads from falling asleep.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apologies for not having narrowed down the issue as tightly as possible before starting this thread. It appears that the memory usage is coming from a concurrent_queue that is getting backed up, and never releasing the memory back to the OS. This was masked by a number of factors, some of which stemmed from my own efforts to narrow down the problem (I was actually making it worse).
That said, I'm not sure I understand how the concurrent_queue deals with memory, but I started a new topic on that since it is different enough from this thread's topic...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#13 "Why would the second scenario (two serial_in_order filters, only one token in the pipeline at a time) not be recognized as serial? Is it something I am doing?"
No particular reason. Currently one trivial situation is trivially optimised. It seems easy enough to go a little bit further just to avoid this very question... although you should realise that such a pipeline would obviously feel very unappreciated.
#14 "Apologies for not having narrowed down the issue as tightly as possible before starting this thread. It appears that the memory usage is coming from a concurrent_queue that is getting backed up, and never releasing the memory back to the OS."
Then I guess it's the high-water mark behaviour of the scalable memory allocator rather than any problem with the queue. Don't worry, be happy: the memory will most likely be refurbished. Please consult previous discussions to find further information (maybe a FAQ entry could be dedicated to this?).
No particular reason. Currently one trivial situation is trivially optimised. It seems easy enough to go a little bit further just to avoid this very question... although you should realise that such a pipeline would obviously feel very unappreciated.
#14 "Apologies for not having narrowed down the issue as tightly as possible before starting this thread. It appears that the memory usage is coming from a concurrent_queue that is getting backed up, and never releasing the memory back to the OS."
Then I guess it's the high-water mark behaviour of the scalable memory allocator rather than any problem with the queue. Don't worry, be happy: the memory will most likely be refurbished. Please consult previous discussions to find further information (maybe a FAQ entry could be dedicated to this?).

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page