Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

How to profile the communication between Host and Mic during Offload Mode

振_徐_
Beginner
876 Views

Recently I need to know exactly the data communication(exactly how much and when) between the Host and MIC while running an offload application.

Unlike Nvidia, Intel doesn't provide a tool like NV Profiler.

So I guess maybe Vtune Amplifier XE can do this job. But unfortunately I get to know that when Amplxe only analyze MIC's performance while I analyse an offload application.

So I come to get help from the forum. Is there anyone who can help me? 

0 Kudos
6 Replies
Peter_W_Intel
Employee
876 Views

VTune Amplifier XE 2016 cannot profile application which works on Nvidia, but can collect performance data on Intel GPU (HD Graphic) supported processor. CPU/GPU Concurrency analysis is supported.

You can use VTune Amplifier to profile MIC offload application on MIC, as well as profile offload program on CPU side. However they are not at a time, two sessions.

0 Kudos
Dmitry_P_Intel1
Employee
876 Views

Hello,

Probably you will find useful the following environment variable to set before you run your offload application:

>export OFFLOAD_REPORT=2

Then for each offload construct you will have information in the form like this:

[Offload] [MIC 0] [File]            sampleC13.c
[Offload] [MIC 0] [Line]            101
[Offload] [MIC 0] [Tag]             Tag 48
[Offload] [HOST]  [Tag 48] [CPU Time]        6.382166(seconds)
[Offload] [MIC 0] [Tag 48] [CPU->MIC Data]   256000016 (bytes)
[Offload] [MIC 0] [Tag 48] [MIC Time]        8.176067(seconds)
[Offload] [MIC 0] [Tag 48] [MIC->CPU Data]   256000000 (bytes)

Thanks & Regards, Dmitry

 

0 Kudos
Peter_W_Intel
Employee
876 Views

>[Offload] [MIC 0] [Tag 48] [CPU->MIC Data]   256000016 (bytes)
>[Offload] [MIC 0] [Tag 48] [MIC->CPU Data]   256000000 (bytes)

Above are cool, but sometime need to know:

>[Offload] [MIC 0] [Tag 48] [CPU->MIC Transaction Time]   ??? seconds
>[Offload] [MIC 0] [Tag 48] [MIC->CPU Transaction Time]   ??? seconds

0 Kudos
振_徐_
Beginner
876 Views

Peter Wang (Intel) wrote:

VTune Amplifier XE 2016 cannot profile application which works on Nvidia, but can collect performance data on Intel GPU (HD Graphic) supported processor. CPU/GPU Concurrency analysis is supported.

You can use VTune Amplifier to profile MIC offload application on MIC, as well as profile offload program on CPU side. However they are not at a time, two sessions.

It's true, VTune Amplifier XE can only be used to analyze the concurrency of either CPU or MIC. Even not both at the same time

Thanks!

0 Kudos
振_徐_
Beginner
876 Views

dmitry-prohorov (Intel) wrote:

Hello,

Probably you will find useful the following environment variable to set before you run your offload application:

>export OFFLOAD_REPORT=2

Then for each offload construct you will have information in the form like this:

[Offload] [MIC 0] [File]            sampleC13.c
[Offload] [MIC 0] [Line]            101
[Offload] [MIC 0] [Tag]             Tag 48
[Offload] [HOST]  [Tag 48] [CPU Time]        6.382166(seconds)
[Offload] [MIC 0] [Tag 48] [CPU->MIC Data]   256000016 (bytes)
[Offload] [MIC 0] [Tag 48] [MIC Time]        8.176067(seconds)
[Offload] [MIC 0] [Tag 48] [MIC->CPU Data]   256000000 (bytes)

Thanks & Regards, Dmitry

 

Thanks for your help!

OFFLOAD_REPORT can vary from 1 to 3, corresponding to different Report Level.

What I am actually want to do is to analyze MYO works. How "_Cilk_shared" works in the background. Currently I'm working on rebuilding MYO and turn on the debug function which will print log while running. Of course, if you have another way for me to take a insight look into MYO progress, looking forward to that!

0 Kudos
振_徐_
Beginner
876 Views

Peter Wang (Intel) wrote:

>[Offload] [MIC 0] [Tag 48] [CPU->MIC Data]   256000016 (bytes)
>[Offload] [MIC 0] [Tag 48] [MIC->CPU Data]   256000000 (bytes)

Above are cool, but sometime need to know:

>[Offload] [MIC 0] [Tag 48] [CPU->MIC Transaction Time]   ??? seconds
>[Offload] [MIC 0] [Tag 48] [MIC->CPU Transaction Time]   ??? seconds

Thanks for your help!

OFFLOAD_REPORT can vary from 1 to 3, corresponding to different Report Level.

What I am actually want to do is to analyze MYO works. How "_Cilk_shared" works in the background. Currently I'm working on rebuilding MYO and turn on the debug function which will print log while running. Of course, if you have another way for me to take a insight look into MYO progress, looking forward to that!

0 Kudos
Reply