Why are CPU cores not evenly used by Intel TBB?

mickru · ‎10-01-2009

Hi,
we are using the commercial version of Intel TBB in version 2.2 released last August. We use it for parallel processing of captured network packets. Our application is a C++ linux service running on SLES10-SP2 using a Xeon Quad core CPU. Our understanding is, that Intel TBB ensures that all CPU cores will evenly contribute to process our data which is worked on using a parallel_for loop.
Few day's ago we have been invited by our HW supplier to test run our service on a new IBM Blade with two Nehalem CPUs. In theory we would have had 16 cores available to our service, but the output from top did show that only two cores had been used - same as on our current quad core Xeons in use. Even so the performance gain on Nehalem was huge, we assume that it would be even higher if TBB would have used all 16 cores instead of just 2.
So my question is now, how can we configure Intel TBB such that it really used all available cores on the system? We thought that this was what Intel TBB is used for - dynamically utilizing all available cores.

Any idea how we can force TBB to really use all available cores? We monitor media streams on a 1GB network connection at line-rate. This means around 21000 concurrent media sessions on the network. I do assume this is enough data to be shared among 16 cores...

Any idea?

Thanks.

TimP · ‎10-01-2009

The dual Nehalem CPU blade would have 8 cores total, with support for 16 logical processors when HyperThreading is enabled. The advantage of HyperThreading is highly dependent on your application; several Intel libraries are set by default to run 1 worker thread per core.
If your application is limited to 2 threads on a quad core, it may not be surprising if that does not change automatically.
For questions specific to TBB, the TBB forum would be more likely to help you find answers.