Community
cancel
Showing results for 
Search instead for 
Did you mean: 
King_Crimson
Beginner
59 Views

can vtune be used to profile Intel COI code on Xeon Phi?

Dear forum,

My code is written in Intel compiler offload infrastructure (COI). The "source" side code is run on the host CPU, and the "sink" side code is run on the MIC. I'm wondering if it is currently possible to profile the offloaded sink side code using vtune (version vtune_amplifier_xe_2015.3.0.403110), and how to do that. I haven't managed to glean any information from the manual yet. Thanks for any help.

0 Kudos
7 Replies
Peter_W_Intel
Employee
59 Views

You can use VTune(TM) Amplifier XE to profile offload code running on Intel(R) Xeon Phi(TM) coprocessor, built by Intel compiler. Please add option "-target-system=mic-host-launch:". For example,

amplxe-cl -collect advanced-hotspots -target-system=mic-host-launch:0  -- ./your_offload_program

Note that all code running on the host will not be profiled, MIC side only.

King_Crimson
Beginner
59 Views

Peter Wang (Intel) wrote:

You can use VTune(TM) Amplifier XE to profile offload code running on Intel(R) Xeon Phi(TM) coprocessor, built by Intel compiler. Please add option "-target-system=mic-host-launch:". For example,

amplxe-cl -collect advanced-hotspots -target-system=mic-host-launch:0  -- ./your_offload_program

Note that all code running on the host will not be profiled, MIC side only.

Peter,

Thanks for your reply. I tried your mic-host-launch solution, but still the profiling data for the offloaded code was not generated.

My code is not the regular pragma offload, but COI (sink-side code run on the MIC has a separate main file).

Peter_W_Intel
Employee
59 Views

Sorry that I am the newbie for COI programming. After a quick learning that I realize COI mode will create two binaries - one executes on Xeon, and another executes on MIC. Launcher only runs binary on PC, then sink binary and its libraries will be sent from PC to MIC, to be executed.

VTune Amplifier only provides offload mode & native mode, I think that native mode might be used:

1. Run your COI executable on PC, then

2. amplxe-cl -target-system=mic-native:0 -c advanced-hotspots -search-dir=$dir-with-sink-code -duration 20

Actually, this is for system wide profiling on MIC, and specify directory (or its child directories) with sink-code binary with symbol info to relocate hot samples in functions of sink binary.

Hopefully we can collect data from MIC.

Regards, Peter 

 

Dmitry_P_Intel1
Employee
59 Views

Hello,

Actually the syntax:

amplxe-cl -collect advanced-hotspots -target-system=mic-host-launch:0  -- ./your_host_executable

should work fine in the case of COI based application.

Does the elapsed time of the collection look resonable? Do you see at least your module of interest in VTune hotspot view?

And BTW - is it Linux or Windows host?

Thanks & Regards, Dmitry

 

King_Crimson
Beginner
59 Views

Thanks.

I double-checked. The collection time appeared too short. According to the console output, vtune executed only the source-side code before the sink-side code was called. Also, adding the binary/symbol search directory didn't let the sink code be found.

PS: GNU/Linux 2.6.32-431.11.2.el6.x86_64, mpss 3.5.1

I used gui and below is the cmd equivalent. Data haven't exceeded 2GB limit.

[vtune_dir]/amplxe-cl -target-system mic-host-launch:0 -collect [custom_analysis] -data-limit=2048 --search-dir sym:p=[dir] --search-dir bin:p=[dir] -- [binary] [parameter_list]

 

Dmitry_P_Intel1
Employee
59 Views

One more check - does the host part launch the card part and ends or waiting for the card part completeness? 

If the second - then most likely the app is not running correctly under VTune. 

If the first - then we can run in host launch mode profiling target system and stopping collection manually when the app is finished.

Thanks & Regards, Dmitry

Peter_W_Intel
Employee
59 Views

@ King

Probably you only did *tiny* work in sink code, that was why I asked to do system profiling on MIC with option "-target-system=mic-native:0" (run VTune with duration first then launch source side binary manually...), and I don't know if option "-target-system=mic-host-launch:0" can support to profile your code on sink (sink code was launched on MIC manually?).

I'm not familiar with writing works in COIPipeline, which is in sink code. I hope to write a simple example which is on source side, get COI Engine, use Engine to create process on sink, wait awhile then destroy the process on MIC. Unfortunately, I failed to find associated .a file to link when building. See,

$icc -g -O3 -I/opt/mpss/3.2.3/sysroots/k1om-mpss-linux/usr/src/debug/mpss-coi-3.2.3-1/mpss-coi-3.2.3/src/include  test.c
/tmp/iccwTqyaB.o: In function `main':
/home/peter/tmp/test.c:17: undefined reference to `COIEngineGetCount'

...

There is only libcoi_device.so under /opt/mpss/3.2.3/sysroots/k1om-mpss-linux/usr/lib64, no static library available. Any idea?

Maybe you can send both "source" code & "sink" code to me for investigating?

Reply