This would be relevant for me too. I have a regression test (from https://libevent.org/) that executes in 91 sec elapsed time on Linux, but ~5-10 sec elapsed time on macOS and MSVC. VTune Amplifier helps me explore that the CPU time on all platforms is ~1-2 sec. But it doesn't let me discover why the elapsed time is roughly 10 times higher on Linux than on the other platforms.
Is it possible to explore methods or the call stack based on elapsed time? I think its very relevant for the runtime of an application where elapsed time is spent, not only CPU time. Its interesting that CPU time is ~1sec, but more relevant that elapsed time accumulates to more than 1.5 minutes.
The traditional way to measure elapsed time by function is to compile with "-p", run the code, and then use "gprof" to process the output and generate a report. This is based on function-level instrumentation, so it should avoid the uncertainties that come from sampling. On the down side, this option may limit the compiler's ability to inline functions, and may cause significant overhead if your functions are very short.
The same type of information may be available via the "profile-guided-optimization" (PGO) features in the Intel compiler?
Well, if CPU Time on Linux in your case is the same as on other platforms, the reason for so long Elapsed Time is inactive waits in the application. You can use "Threading" analysis type in VTune to investigate wait reasons. Look on the context switches on the timeline and explore wait stacks for the long ones (click inside the context switch). You can also try renovated "System Overview" analysis type in VTune Profiler 2020 which should help you to explore any platform/OS issues affecting your application.