I am running VTune 9.1 build 406 and I am looking at the call graph. The application I'm profiling loads up all the input in the main thread, then it creates a thread pool with 4 threads that are then used to do all of the work. Once the 4 other threads have been created and queued up the main thread just waits on them all to finish.
Using one set of inputs and looking at the total time for each thread I see the following:
These results are as expected, and Thread_0's total time is greater than the others since it creates them and waits on them all to finish.
However using a different set of inputs I get the following results for total time:
Looking at those results I don't understand how the total time for Thread_0 is less then any of the 4 threads it creates. I tried re-running the profile with the same set of inputs and every time the results were the same.
What exactly are the numbers you're displaying? They look like event counts, possibly from a clock-tick event or real time clock interrupt, not times. Probably not accumulated times. The variability from one run to the next may be an issue, but having thread 0 counts lower than those of the other threads by itself doesn't raise any warning flags for me. Can you share a little more detail?
The number are not event counts, they are the total times for each thread as reported by the Call Graph. I assumed the times were in microseconds because looking at the total time for thread0 would be 24 minutes which pretty close to the time it took to complete during the Call Graph profile.
I will try and upload a screen shot showing the call graph window grouped by thread.