Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
New Contributor III
70 Views

Intel VTune Amplifier on Xeon Phi

Jump to solution

Hi All,

I have few questions regarding Intel VTune Amplifier which I plan to use on Xeon Phi 7210:

  • Since, Intel VTune Amplifier is a GUI based application, can anyone share what's the overhead added on the system?
  • How can I be sure that the data I am getting with Intel VTune Amplifier correspondence only to the application being profiled? 
  • Can this tool be used for Intel Optimized Caffe analysis?
    • Can I collect data without GUI i.e. using command line version of Intel VTune Amplifier if available?

Thanks.

0 Kudos

Accepted Solutions
Highlighted
70 Views

Hello Chetan Arvind Patil,

Please find below answers on your questions:

  • Since, Intel VTune Amplifier is a GUI based application, can anyone share what's the overhead added on the system?

On Linux we can observe up to 20-25% of one core occupancy by VTune Amplifier UI. So if you profile a highly throughput application I would recommend to perform collection in command line and the open result in VTune GUI to eliminate any side effects on sharing the same core by VTune UI and the application under profiling.

Please also note that because of a single thread performance on Xeon Phi VTune GUI might be a bit "wiggling" so one recommended way is to collect results through command line on Xeon Phi target and then transfer them (or use file share) to a client machine  with better single thread performance. You can also use remote collection from a client machine to Xeon Phi. Then VTune will automatically copy traces and files for symbol resolving on a client machine.

  • How can I be sure that the data I am getting with Intel VTune Amplifier correspondence only to the application being profiled?

If you use "Launch Application" mode providing your application to profile then VTune will show you performance information that is related to your application and its follow child processes (if it is not specially switched off). However there are  metrics based on system wide monitoring like memory bandwidth on uncore events that will include all what happened on the system.

The question on Caffe was address in another thread I suppose.

BTW - are you interested in algorithmic optimization or also going to micro-architecture level?

Thanks & Regards, Dmitry

 

View solution in original post

0 Kudos
4 Replies
Highlighted
71 Views

Hello Chetan Arvind Patil,

Please find below answers on your questions:

  • Since, Intel VTune Amplifier is a GUI based application, can anyone share what's the overhead added on the system?

On Linux we can observe up to 20-25% of one core occupancy by VTune Amplifier UI. So if you profile a highly throughput application I would recommend to perform collection in command line and the open result in VTune GUI to eliminate any side effects on sharing the same core by VTune UI and the application under profiling.

Please also note that because of a single thread performance on Xeon Phi VTune GUI might be a bit "wiggling" so one recommended way is to collect results through command line on Xeon Phi target and then transfer them (or use file share) to a client machine  with better single thread performance. You can also use remote collection from a client machine to Xeon Phi. Then VTune will automatically copy traces and files for symbol resolving on a client machine.

  • How can I be sure that the data I am getting with Intel VTune Amplifier correspondence only to the application being profiled?

If you use "Launch Application" mode providing your application to profile then VTune will show you performance information that is related to your application and its follow child processes (if it is not specially switched off). However there are  metrics based on system wide monitoring like memory bandwidth on uncore events that will include all what happened on the system.

The question on Caffe was address in another thread I suppose.

BTW - are you interested in algorithmic optimization or also going to micro-architecture level?

Thanks & Regards, Dmitry

 

View solution in original post

0 Kudos
Highlighted
New Contributor III
70 Views

Hi Dmitry,

I am interested in software optimization based on how a framework/application is utilizing the architecture like Xeon Phi.

Thanks. 

0 Kudos
Highlighted
70 Views

Hello Chetan Arvind Patil,

Let me recommend to try "HPC Performance Characterization" analysis that shows several important aspects of application performance on Xeon Phi at once: parallelism/CPU utilization with insight to parallel runtimes efficiency like OpenMP, memory access efficiency and some vectorization efficiency information.

Also VTune now has a light weight performance snapshot tool - application performance snapshot that can make quick performance overview in the form of a command line and HTML report. The tool is in <isntall_dir>/bin64 directory with "aps" name.

Thanks & Regards, Dmitry

 

0 Kudos
Highlighted
New Contributor III
70 Views

Hi Dmitry,

Can "HPC Performance Characterization" do thread level analysis?

Thanks.

0 Kudos