- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Is there a tool or functionality in VTune to profile offload applications, so that the programmer can view the timeline of both the host and the MIC activity? In other words, how does one profile applications that overlap CPU and MIC computation and/or overlap computation wtih data movement?
コピーされたリンク
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
I haven't heard anything recently on this subject. As far as I know, it's still necessary to run the VTune profiling separately, as different categories of collection must be selected. I'd hope that the facilities for comparing runs would permit alignment of the timelines.
I'd be more hopeful of interesting results for the case of computation on both host and MIC; which is definitely interesting for "symmetric" MPI.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Thanks. For symmetric MPI, ITAC does exactly that, but we were interested in offload. For offload we ended up just inserting timing/output statements into the code and collected the timeline in this way.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Unfortunately, VTune does not yet support simultaneously profiling both the host and the coprocessor. However, if your application performance does not vary significantly between runs, then you could collect the profiling results on the host and the coprocessor through two separate runs and then compare the two runs in VTune, just like Tim suggested.
Also, the OFFLOAD_REPORT (https://software.intel.com/en-us/node/512835 or https://software.intel.com/en-us/node/512584) functionality provides by the Intel Compiler can also provide some information about the data transfers and the coprocessor computes times.
