Software Archive
Read-only legacy content
17061 Discussions

Profiling offload applications on host and target simultaneously

Andrey_Vladimirov
New Contributor III
548 Views

Is there a tool or functionality in VTune to profile offload applications, so that the programmer can view the timeline of both the host and the MIC activity? In other words, how does one profile applications that overlap CPU and MIC computation and/or overlap computation wtih data movement?

 

0 Kudos
3 Replies
TimP
Honored Contributor III
548 Views

I haven't heard anything recently on this subject.  As far as I know, it's still necessary to run the VTune profiling separately, as different categories of collection must be selected.  I'd hope that the facilities for comparing runs would permit alignment of the timelines.

I'd be more hopeful of interesting results for the case of computation on both host and MIC; which is definitely interesting for "symmetric" MPI.

0 Kudos
Andrey_Vladimirov
New Contributor III
548 Views

Thanks. For symmetric MPI, ITAC does exactly that, but we were interested in offload. For offload we ended up just inserting timing/output statements into the code and collected the timeline in this way.

0 Kudos
Sumedh_N_Intel
Employee
548 Views

Unfortunately, VTune does not yet support simultaneously profiling both the host and the coprocessor. However, if your application performance does not vary significantly between runs, then you could collect the profiling results on the host and the coprocessor through two separate runs and then compare the two runs in VTune, just like Tim suggested. 

Also, the OFFLOAD_REPORT (https://software.intel.com/en-us/node/512835 or https://software.intel.com/en-us/node/512584) functionality provides by the Intel Compiler can also provide some information about the data transfers and the coprocessor computes times. 

0 Kudos
Reply