- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Recently I need to know exactly the data communication(exactly how much and when) between the Host and MIC while running an offload application.
Unlike Nvidia, Intel doesn't provide a tool like NV Profiler.
So I guess maybe Vtune Amplifier XE can do this job. But unfortunately I get to know that when Amplxe only analyze MIC's performance while I analyse an offload application.
So I come to get help from the forum. Is there anyone who can help me?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
VTune Amplifier XE 2016 cannot profile application which works on Nvidia, but can collect performance data on Intel GPU (HD Graphic) supported processor. CPU/GPU Concurrency analysis is supported.
You can use VTune Amplifier to profile MIC offload application on MIC, as well as profile offload program on CPU side. However they are not at a time, two sessions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Probably you will find useful the following environment variable to set before you run your offload application:
>export OFFLOAD_REPORT=2
Then for each offload construct you will have information in the form like this:
[Offload] [MIC 0] [File] sampleC13.c
[Offload] [MIC 0] [Line] 101
[Offload] [MIC 0] [Tag] Tag 48
[Offload] [HOST] [Tag 48] [CPU Time] 6.382166(seconds)
[Offload] [MIC 0] [Tag 48] [CPU->MIC Data] 256000016 (bytes)
[Offload] [MIC 0] [Tag 48] [MIC Time] 8.176067(seconds)
[Offload] [MIC 0] [Tag 48] [MIC->CPU Data] 256000000 (bytes)
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>[Offload] [MIC 0] [Tag 48] [CPU->MIC Data] 256000016 (bytes)
>[Offload] [MIC 0] [Tag 48] [MIC->CPU Data] 256000000 (bytes)
Above are cool, but sometime need to know:
>[Offload] [MIC 0] [Tag 48] [CPU->MIC Transaction Time] ??? seconds
>[Offload] [MIC 0] [Tag 48] [MIC->CPU Transaction Time] ??? seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Peter Wang (Intel) wrote:
VTune Amplifier XE 2016 cannot profile application which works on Nvidia, but can collect performance data on Intel GPU (HD Graphic) supported processor. CPU/GPU Concurrency analysis is supported.
You can use VTune Amplifier to profile MIC offload application on MIC, as well as profile offload program on CPU side. However they are not at a time, two sessions.
It's true, VTune Amplifier XE can only be used to analyze the concurrency of either CPU or MIC. Even not both at the same time
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dmitry-prohorov (Intel) wrote:
Hello,
Probably you will find useful the following environment variable to set before you run your offload application:
>export OFFLOAD_REPORT=2
Then for each offload construct you will have information in the form like this:
[Offload] [MIC 0] [File] sampleC13.c
[Offload] [MIC 0] [Line] 101
[Offload] [MIC 0] [Tag] Tag 48
[Offload] [HOST] [Tag 48] [CPU Time] 6.382166(seconds)
[Offload] [MIC 0] [Tag 48] [CPU->MIC Data] 256000016 (bytes)
[Offload] [MIC 0] [Tag 48] [MIC Time] 8.176067(seconds)
[Offload] [MIC 0] [Tag 48] [MIC->CPU Data] 256000000 (bytes)Thanks & Regards, Dmitry
Thanks for your help!
OFFLOAD_REPORT can vary from 1 to 3, corresponding to different Report Level.
What I am actually want to do is to analyze MYO works. How "_Cilk_shared" works in the background. Currently I'm working on rebuilding MYO and turn on the debug function which will print log while running. Of course, if you have another way for me to take a insight look into MYO progress, looking forward to that!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Peter Wang (Intel) wrote:
>[Offload] [MIC 0] [Tag 48] [CPU->MIC Data] 256000016 (bytes)
>[Offload] [MIC 0] [Tag 48] [MIC->CPU Data] 256000000 (bytes)Above are cool, but sometime need to know:
>[Offload] [MIC 0] [Tag 48] [CPU->MIC Transaction Time] ??? seconds
>[Offload] [MIC 0] [Tag 48] [MIC->CPU Transaction Time] ??? seconds
Thanks for your help!
OFFLOAD_REPORT can vary from 1 to 3, corresponding to different Report Level.
What I am actually want to do is to analyze MYO works. How "_Cilk_shared" works in the background. Currently I'm working on rebuilding MYO and turn on the debug function which will print log while running. Of course, if you have another way for me to take a insight look into MYO progress, looking forward to that!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page