- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have few questions regarding Intel VTune Amplifier which I plan to use on Xeon Phi 7210:
- Since, Intel VTune Amplifier is a GUI based application, can anyone share what's the overhead added on the system?
- How can I be sure that the data I am getting with Intel VTune Amplifier correspondence only to the application being profiled?
- Can this tool be used for Intel Optimized Caffe analysis?
- Can I collect data without GUI i.e. using command line version of Intel VTune Amplifier if available?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Chetan Arvind Patil,
Please find below answers on your questions:
- Since, Intel VTune Amplifier is a GUI based application, can anyone share what's the overhead added on the system?
On Linux we can observe up to 20-25% of one core occupancy by VTune Amplifier UI. So if you profile a highly throughput application I would recommend to perform collection in command line and the open result in VTune GUI to eliminate any side effects on sharing the same core by VTune UI and the application under profiling.
Please also note that because of a single thread performance on Xeon Phi VTune GUI might be a bit "wiggling" so one recommended way is to collect results through command line on Xeon Phi target and then transfer them (or use file share) to a client machine with better single thread performance. You can also use remote collection from a client machine to Xeon Phi. Then VTune will automatically copy traces and files for symbol resolving on a client machine.
- How can I be sure that the data I am getting with Intel VTune Amplifier correspondence only to the application being profiled?
If you use "Launch Application" mode providing your application to profile then VTune will show you performance information that is related to your application and its follow child processes (if it is not specially switched off). However there are metrics based on system wide monitoring like memory bandwidth on uncore events that will include all what happened on the system.
The question on Caffe was address in another thread I suppose.
BTW - are you interested in algorithmic optimization or also going to micro-architecture level?
Thanks & Regards, Dmitry
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Chetan Arvind Patil,
Please find below answers on your questions:
- Since, Intel VTune Amplifier is a GUI based application, can anyone share what's the overhead added on the system?
On Linux we can observe up to 20-25% of one core occupancy by VTune Amplifier UI. So if you profile a highly throughput application I would recommend to perform collection in command line and the open result in VTune GUI to eliminate any side effects on sharing the same core by VTune UI and the application under profiling.
Please also note that because of a single thread performance on Xeon Phi VTune GUI might be a bit "wiggling" so one recommended way is to collect results through command line on Xeon Phi target and then transfer them (or use file share) to a client machine with better single thread performance. You can also use remote collection from a client machine to Xeon Phi. Then VTune will automatically copy traces and files for symbol resolving on a client machine.
- How can I be sure that the data I am getting with Intel VTune Amplifier correspondence only to the application being profiled?
If you use "Launch Application" mode providing your application to profile then VTune will show you performance information that is related to your application and its follow child processes (if it is not specially switched off). However there are metrics based on system wide monitoring like memory bandwidth on uncore events that will include all what happened on the system.
The question on Caffe was address in another thread I suppose.
BTW - are you interested in algorithmic optimization or also going to micro-architecture level?
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
I am interested in software optimization based on how a framework/application is utilizing the architecture like Xeon Phi.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Chetan Arvind Patil,
Let me recommend to try "HPC Performance Characterization" analysis that shows several important aspects of application performance on Xeon Phi at once: parallelism/CPU utilization with insight to parallel runtimes efficiency like OpenMP, memory access efficiency and some vectorization efficiency information.
Also VTune now has a light weight performance snapshot tool - application performance snapshot that can make quick performance overview in the form of a command line and HTML report. The tool is in <isntall_dir>/bin64 directory with "aps" name.
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
Can "HPC Performance Characterization" do thread level analysis?
Thanks.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page