Software Archive
Read-only legacy content
17061 ディスカッション

Profiling offload applications on host and target simultaneously

Andrey_Vladimirov
新規コントリビューター III
576件の閲覧回数

Is there a tool or functionality in VTune to profile offload applications, so that the programmer can view the timeline of both the host and the MIC activity? In other words, how does one profile applications that overlap CPU and MIC computation and/or overlap computation wtih data movement?

 

0 件の賞賛
3 返答(返信)
TimP
名誉コントリビューター III
576件の閲覧回数

I haven't heard anything recently on this subject.  As far as I know, it's still necessary to run the VTune profiling separately, as different categories of collection must be selected.  I'd hope that the facilities for comparing runs would permit alignment of the timelines.

I'd be more hopeful of interesting results for the case of computation on both host and MIC; which is definitely interesting for "symmetric" MPI.

Andrey_Vladimirov
新規コントリビューター III
576件の閲覧回数

Thanks. For symmetric MPI, ITAC does exactly that, but we were interested in offload. For offload we ended up just inserting timing/output statements into the code and collected the timeline in this way.

Sumedh_N_Intel
従業員
576件の閲覧回数

Unfortunately, VTune does not yet support simultaneously profiling both the host and the coprocessor. However, if your application performance does not vary significantly between runs, then you could collect the profiling results on the host and the coprocessor through two separate runs and then compare the two runs in VTune, just like Tim suggested. 

Also, the OFFLOAD_REPORT (https://software.intel.com/en-us/node/512835 or https://software.intel.com/en-us/node/512584) functionality provides by the Intel Compiler can also provide some information about the data transfers and the coprocessor computes times. 

返信