Speed bump in parallel_for after ~30s of execution time.
I am using a simple `parallel_for` to do some geometric computations (game context). At first, I get about 30 frames per second for my execution time. However, after quite some time, TBB seems to change its load balancing and performance increases up to ~50 fps.
My question is, how can I figure out what changes, so I can provide adequate grainsize and/or partitioner and/or hints to get the performance boost right from the get-go? What APIs are available to figure out what is happening underneath the hood?
I've tried monitoring the grainsize in the `parallel_for` callback, but it doesn't change. At least, what you get from `block_range::grainsize()` doesn't change. Thank you.