When performing VTune (16.0.3) runs of a few minutes (e.g. 3), the time to read in and finalize the sample data is excruciating long. In looking at the resource display of the System Monitor (Linux on KNL) it appears that very few threads, perhaps only 1, is involved in preparing the data for analysis display.
Can this be made more multi-threaded?
The point is valid and VTune team has been working on parallelizing finalization step. Stay tuned on this.
So far you can use one tip. Collect on KNL with -no-auto-finalize option and then finalize results on Xeon host machine. Since single thread performance is better there - the results will be finalized faster. One thing to notice with this - you should explicitly set search directories for binaries on host with something like this:
>amplxe-cl -finalize -r <my_result_dir> -search-dir=<my_bin_dir> -search-dir=<openmp_runtime_dir>
Thanks & Regards, Dmitry