I have implemented an application that uses Intel TBB pipeline pattern for parallel processing on Intel Xeon CPU E5420 @ 2.50GHz running RHEL 6.
The application basically composes of 8 pipelines. Each pipeline has one token (making it one thread per pipeline). Each pipeline receives data from an endpoint and processes it to completion. I ran this application and collected general exploration analysis data using vTune amplifier. The profiler reported high CPI in finish_task_switch function of vmlinux module which suggests that the kernel is spending more time performing context switching and adversely affecting performance of the application.
What I would like to understand is why is the kernel performing high context switching? Will each pipeline be scheduled on the same CPU? Is there a way to assign CPU affinity with each pipeline? How can reduce this performance impacting behavior? Please provide some optimization tips.