I am developing a high performance rendering application where I need to provide a copy of my geometry data per each thread.
So far I have used Parallel For template with data being copied per each range in the task stealing system. Now I profiled my code and it seems copying that memory chunk is adding up some overhead with the number of ranges increase. I found
task_scheduler_observer class can provide access to events such as when a new thread is added to the pool. Now I basically want to prepare copies to the number of threads in the schedular and submit data corresponding to each thread as soon as it is added to the system. How can I do this?
You might make tbb::task_scheduler_init::default_num_threads() copies in advance, perhaps a few more based on the number of user threads in your program, to reduce latency at tbb::task_scheduler_observer::on_scheduler_entry() time, if that is what you mean? Note that task_scheduler_init does not provide a way to query the actual number of worker threads either serving the current user thread or across all user threads, but you can probably work around that. A few copies may be wasted, and you would still suffer latency for any threads exceeding the expected number, but the general case would still be improved. The main thing is to only make copies for threads, not for subranges, as you have already discovered.
(Added 2011-10-27, after #2) Note that I just assumed there was a valid reason to make a copy (please explain!), but also that Alexey's suggestion to use enumerable_thread_specific would not address the stated purpose of preparing copies in advance (although, if the reason is to avoid locks at that time instead of just reducing latency at a later time, you'd still possibly run into trouble with my suggestion unless you pay more in terms of possibly wasted prepared copies). Generally, if the question does not completely make sense to me, I make the gamble that it's more useful to fire off something that may be useful than to first start a high-latency clarification phase at that time, but that doesn't mean the discussion has to stop there.
Do threads change their copies? If no, why not just use a single read-only copy for each thread? If yes, you may look at tbb::enumerable_thread_specific (which is the TBB way to set and use thread-local data). It will automatically create per-thread copies of the data when necessary; you just specify the way it should be done.