- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]typedef std::vector> storage_type // solution one void foo1() { storage_type storage; // n is computed at runtime storage.resize(n) } // solution two void foo2() { static tbb::enumerable_thread_specific tls_storage; storage_type& storage = tls_storage.local(); // n is computed at runtime storage.resize(n) }[/cpp]
foo1 has to allocate on free store each time its called.
foo2 doesn't have to reallocate as long as the high watermark is not exceeded. Max storage size is relatively small, memory waste is not an issue.
Will foo2 work properly even after task_scheduler_init was reinitialized, provided that the storage is not used to transfer data between successive calls.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As long as foo2() is not recursive, directly, indirectly in a straightforward manner, or... indirectly by way of the scheduler (in case the thread is sent out stealing), or something I haven't even thought of yet. :-)
(Added) Be sure to tell us how much you saved, if anything. The scalable allocator is supposed to be quite efficient, but without checking the code I'm not certain this wouldn't trigger any problematic behaviour (what happens if an allocator block is inserted into a bin for one allocation and that allocation is removed immediately afterwards, emptying that block?).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The function is not recursive and doesn't block or pause in any other way. Can the scheduler stop it half way through and let the worker thread to run another task which in turn calls foo2?
I am not sure what you mean by blocks and bins. Is that related to the internals of TBB's allocator?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, that's significant, I guess, especially if you mean 15% less time instead of 15% faster. A bit surprising.
"The function is not recursive and doesn't block or pause in any other way. Can the scheduler stop it half way through and let the worker thread to run another task which in turn calls foo2?"
No.
"I am not sure what you mean by blocks and bins. Is that related to the internals of TBB's allocator?"
Yes. But now I think I remember that an empty block would not be actively recycled before a new allocation needs one, so I'm not sure where the allocator is still spending that time. Anyway, a good result, but only for selective use.
(Added) Well... 1/(1-0.15)-1=0.176... :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Calls (millions) per second. Single threaded test (can't do multithreaded yet).
1) Storage provided by the caller: ICC = 116, MSVC = 112.
2) Static local TLS (foo2): ICC = 116, MSVC = 107.
3) Local vector (foo1): ICC = 91, MSVC = 96.
Case 1 requires interface change.
In another test compiled with ICC case 1 throughput was 187 vs case 3 throughput of 130.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Still, if the initial dimensioning is final, you could also estimate that a fraction f of uses would fit in an automatic array of size N, with the remaining cases allocated dynamically, thus largely hiding the allocation cost (see Amdahl's formula), or go for a not strictly portable solution like using a variable to dimension an automatic array in all cases (supported by GNU, I think, don't know about other compilers) or alloca(). Just be reasonable with how much stack space you use (I don't have numbers there), and roll your own cache-line alignment (if really needed).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How about generic solution, maybe new TBB feature?
[cpp]struct raw_storage_stack; struct scoped_reference; // Reference to raw storage struct scoped_reference { // pointer to the raw storage void* pointer(); // pushes storage back into stack ~scoped_reference; }; struct raw_storage_stack { /* If stack is not empty and top is big enough, return it without new allocation. Otherwise allocate new storage and return it */ scoped_reference next_reference(int size); }; tbb::enumerable_thread_specificraw_storage_stack maintains a stack of pre-allocated pointers. Allocation is amortized, similar to std::vector growths behavior. scoped_reference knows the stack from which the storage came and pushes it back into the same stack upon destruction. Stack hides behind TLS and doesn't need to be synchronized. This works with recursive functions too. Memory allocation is amortized and works well with functions called from tight loops. After several cycles stack would have enough storage and no new allocation would be needed. Also, it may improve cache hits, because the storage would be recycled by the same thread and stack based storage allocation would decreases cache cooling.global_tls; void foo() { // how much storage needed int n; scoped_reference ref = global_tls.local().next_reference(n); // storage big enough for n bytes void* ptr = ref.pointer(); // ref expires here and the storage is pushed back into stack } [/cpp]
A new feature would be justified if the problem is fairly commonplace where TBB is used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"A new feature would be justified if the problem is fairly commonplace where TBB is used."
That says it all, because it does not seem unlikely that only few functions/programs exhibit this pattern (lowercase p) of temporary dynamic-size allocation with little associated work to drown out the cost of allocation, which even here is only 15%. Until then, it seems fine to just use your solution, or go with something simple (like I proposed in #5), but in any case not to worry about it too much. :-)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page