My project is configured as CentOS running on an HP desktop. The software "system" consists of about five processes running simultaneously. I'll use VTune in GUI mode. I will most likely start the entire system, then attach to a single process, disconnect, and then attach to a different process. There won't be any consistent pattern in when I would attach to any particular process.
If I attach VTune to any single process, will the timing of that process be affected? If the timing is affected in that one process, then the inter-process communications might be altered, therefore changing the entire system timing.
The overhead that VTune can introduce to an application run depends on collection type and hence analysis type that you use. For sampling based analysis types like basic hotspots, advanced hotspots (no stacks), memory access (w/o memory object instrumentation), general exploration the overhead should not be more than 5-10% with default sampling interval (10ms for basic hotspots, 1ms for other HW counter based analysis types like advanced hotspots or memory access). If you still see that VTune perturbs your application behavior - try bigger sampling interval (though it can impact statistical representativeness of the results).
For analysis types that use tracing of threading, I/O etc API like concurrency, waits and locks the overhead can vary depending on intensiveness of sync objects usage and can be up to 100% of application runtime. The same can be for memory access analysis with memory object (allocation and deallocation) tracing - you can use a special knob mem-object-size-min-thres to ignore small objects to low overhead in this case.
Also HW-counter based analysis with stacks (e.g. advanced hotspots with stacks) can bring additional overhead (up to 50%). See https://software.intel.com/en-us/node/596598 how to use -stack-size knob to reduce overhead in this case.
Thanks & Regards, Dmitry