- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear forum,
My code is written in Intel compiler offload infrastructure (COI). The "source" side code is run on the host CPU, and the "sink" side code is run on the MIC. I'm wondering if it is currently possible to profile the offloaded sink side code using vtune (version vtune_amplifier_xe_2015.3.0.403110), and how to do that. I haven't managed to glean any information from the manual yet. Thanks for any help.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use VTune(TM) Amplifier XE to profile offload code running on Intel(R) Xeon Phi(TM) coprocessor, built by Intel compiler. Please add option "-target-system=mic-host-launch:
amplxe-cl -collect advanced-hotspots -target-system=mic-host-launch:0 -- ./your_offload_program
Note that all code running on the host will not be profiled, MIC side only.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Peter Wang (Intel) wrote:
You can use VTune(TM) Amplifier XE to profile offload code running on Intel(R) Xeon Phi(TM) coprocessor, built by Intel compiler. Please add option "-target-system=mic-host-launch:
". For example, amplxe-cl -collect advanced-hotspots -target-system=mic-host-launch:0 -- ./your_offload_program
Note that all code running on the host will not be profiled, MIC side only.
Peter,
Thanks for your reply. I tried your mic-host-launch solution, but still the profiling data for the offloaded code was not generated.
My code is not the regular pragma offload, but COI (sink-side code run on the MIC has a separate main file).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry that I am the newbie for COI programming. After a quick learning that I realize COI mode will create two binaries - one executes on Xeon, and another executes on MIC. Launcher only runs binary on PC, then sink binary and its libraries will be sent from PC to MIC, to be executed.
VTune Amplifier only provides offload mode & native mode, I think that native mode might be used:
1. Run your COI executable on PC, then
2. amplxe-cl -target-system=mic-native:0 -c advanced-hotspots -search-dir=$dir-with-sink-code -duration 20
Actually, this is for system wide profiling on MIC, and specify directory (or its child directories) with sink-code binary with symbol info to relocate hot samples in functions of sink binary.
Hopefully we can collect data from MIC.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Actually the syntax:
amplxe-cl -collect advanced-hotspots -target-system=mic-host-launch:0 -- ./your_host_executable
should work fine in the case of COI based application.
Does the elapsed time of the collection look resonable? Do you see at least your module of interest in VTune hotspot view?
And BTW - is it Linux or Windows host?
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
I double-checked. The collection time appeared too short. According to the console output, vtune executed only the source-side code before the sink-side code was called. Also, adding the binary/symbol search directory didn't let the sink code be found.
PS: GNU/Linux 2.6.32-431.11.2.el6.x86_64, mpss 3.5.1
I used gui and below is the cmd equivalent. Data haven't exceeded 2GB limit.
[vtune_dir]/amplxe-cl -target-system mic-host-launch:0 -collect [custom_analysis] -data-limit=2048 --search-dir sym:p=[dir] --search-dir bin:p=[dir] -- [binary] [parameter_list]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One more check - does the host part launch the card part and ends or waiting for the card part completeness?
If the second - then most likely the app is not running correctly under VTune.
If the first - then we can run in host launch mode profiling target system and stopping collection manually when the app is finished.
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ King
Probably you only did *tiny* work in sink code, that was why I asked to do system profiling on MIC with option "-target-system=mic-native:0" (run VTune with duration first then launch source side binary manually...), and I don't know if option "-target-system=mic-host-launch:0" can support to profile your code on sink (sink code was launched on MIC manually?).
I'm not familiar with writing works in COIPipeline, which is in sink code. I hope to write a simple example which is on source side, get COI Engine, use Engine to create process on sink, wait awhile then destroy the process on MIC. Unfortunately, I failed to find associated .a file to link when building. See,
$icc -g -O3 -I/opt/mpss/3.2.3/sysroots/k1om-mpss-linux/usr/src/debug/mpss-coi-3.2.3-1/mpss-coi-3.2.3/src/include test.c
/tmp/iccwTqyaB.o: In function `main':
/home/peter/tmp/test.c:17: undefined reference to `COIEngineGetCount'
...
There is only libcoi_device.so under /opt/mpss/3.2.3/sysroots/k1om-mpss-linux/usr/lib64, no static library available. Any idea?
Maybe you can send both "source" code & "sink" code to me for investigating?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page