- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When performing VTune (16.0.3) runs of a few minutes (e.g. 3), the time to read in and finalize the sample data is excruciating long. In looking at the resource display of the System Monitor (Linux on KNL) it appears that very few threads, perhaps only 1, is involved in preparing the data for analysis display.
Can this be made more multi-threaded?
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jim,
The point is valid and VTune team has been working on parallelizing finalization step. Stay tuned on this.
So far you can use one tip. Collect on KNL with -no-auto-finalize option and then finalize results on Xeon host machine. Since single thread performance is better there - the results will be finalized faster. One thing to notice with this - you should explicitly set search directories for binaries on host with something like this:
>amplxe-cl -finalize -r <my_result_dir> -search-dir=<my_bin_dir> -search-dir=<openmp_runtime_dir>
Thanks & Regards, Dmitry

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page