Performance lost due to context switching by kernel
I am working on a C++ application that has to process data in real-time. The application uses Intel TBB's pipeline pattern for parallel processing of data. The application has multiple pipelines each with single token to process data.
I built this application with ICC and started performance measurements using vTune Amplifier. During the general exploration, I noticed that vTune Amplifier always reported high CPI and issues with the final_task_switch() function in vmlinux module.
Does this mean that kernel is spending too much time context swapping? Can anyone provide me suggestions on how to tackle this performance reducing behaviour.
Hello Vishal, sorry I missed this entry. Are you still interested in following up? If final_task_switch() is part of a context switch, it may well be doing expensive instructions which have a high CPI. It doesn't necessarily mean that you are spending too much time in context switching.
I would probably back up a level and gather some general stats. See http://www.cyberciti.biz/tips/how-do-i-find-out-linux-cpu-utilization.html for "How do I Find Out Linux CPU Utilization?". It is very general but if 'top', 'vmstat' and 'sar' show your app is using 90-99% of the cpu, then the context switchesprobably aren't hurting you. sar and mpstat (part of the great sysstat utilities)can report context switches per second per process (or maybe thread).
Then, if you REALLY want to dig into the details, you can do linux kernel tracing with the context switch event group which will show you each context along with a reason for the context switch. This can be tons of data.
The linux kernel trace info will let you see each process/thread switching in/out on each cpu but it won't tell you what instructions the thread was running on the cpu. Other linux tracing events can tell you kernel tasks that the thread was doing (like allocations, IOs, system calls).
Hope this helps and sorry for the delay in response. Pat