Has anyone done any comparisons between default worker thread affinity and forcing the threads to specific processors? I'm assuming that TBB does not set affinity. Has anyone noticed faster or more consistent run-times when setting affinity?
Do you mean besides using the affinity partitioner? That has proven very effective on particular test codes but your results may vary depending on algorithm. As a general principal, though, the act of affinitizing restricts the versatility of a process thread, keeping it from potential work on other HW threads and possibly exacerbating load imbalance. But there may be environments where these sorts of missed opportunities are overshadowed by other performance concerns like a slow process scheduler or some particular NUMA (NonUniform Memory Architecture) configuration.
I see the affinity partitioner as being at the task level - which tasks are assigned to which worker threads. I'm asking about the worker thread affinity which the second half of your notes goes into. Usually the OS does a pretty good job assigning threads to cores, but if it moves them, then unnecessary cache misses may result. I was going to run some tests both with and without restricted affinity, but was hoping someone had already done the legwork. I guess it really is application and PC load specific though.
I don't know much about the "already done legwork" part, but I regularly see in the Microsoft Task Manager the bouncing of single-threaded processes that exceed the time slice from HW thread to HW thread, so I can't attest to the "OS does a pretty good job" claim--while initial assignment may be well distributed, subsequent migrations can seem chaotic. Good luck on your research; I'm sure you'll share anything you find interesting with the rest of us.