Software Archive
Read-only legacy content
17061 Discussions

Getting call stack info in VTune when profiling native application on MIC

Victor_L_2
Beginner
1,881 Views

I apologize if this was answered somewhere else, but I couldn't find any answer in VTune tutorials on on this forum ...

I'm trying to profile native application running on Phi using VTune Amplifier. I'm following the suggestions in Hands-on Lab: Optimizing Monte Carlo on Intel Phi. I've compiled my application with flags "-g -shared-intel -shared-libgcc -debug inline-debug-info". In VTune project properties I've specified Application=ssh and Application Parameters=<name of the script on mic0 to execute>

Application runs fine, and VTune collects data, but in Bottom-up view I can't get the call stack information for my application (see attached screenshot). I'm using the latest VTune Amplifier XE2013 and Intel compiler v 13.1.3.

Any suggestions for getting Call Stack information in VTune or any other techniques I should use for profiling application on Phi?

0 Kudos
11 Replies
Sumedh_N_Intel
Employee
1,881 Views

Last I checked (as of Intel VTune Amplifier XE 2013 Update 11), collecting call stack information on the coprocessor using Intel VTune Amplifier XE is not supported. I could submit a feature request for this to the developers on your behalf, if you would like that. However, with that said, I cannot make any promises as to if and when this feature will be implemented. 

0 Kudos
Victor_L_2
Beginner
1,881 Views

Having call stack information available will be extremely helpful when profiling complex applications. Please submit this feature request. Thank you!

0 Kudos
Sumedh_N_Intel
Employee
1,881 Views

Hi Victor, 

I have filed a feature request for Intel VTune Amplifier XE: 6000025555

On further investigation, I found that ITAC can be used to trace any source code. Here are the basic steps to use ITAC to view the call stack:

- Install ITAC on the machine. ITAC is not available as a stand-alone package and is available as a part of the Intel Cluster Studio XE. 
- Source itacvars.sh from the <install_dir>/itac/<version>/bin/ directory; this will set your environment
- Recompile your application using the –tcollect and –mmic switches: this will compile for the coprocessor in native mode and will link in the trace collector libraries
- Make sure the trace collector libs are available on the card
            o Mainly, the files under <install_dir>/itac/<version>/mic/slib/* should be copied to the card, under /lib64
- Now, run your application on the card. The trace collector will create a few files with the name <exe_name>.stf*
- Now, those files will be created in the same location where you ran your executable; go ahead and transfer them over to the host
- You can view those *.stf* files using the GUI, by typing “traceanalyzer <exe_name>.stf”; this will try to open an x-application so make sure your display is set correctly
- Once the GUI is open, on the front page, you’ll see a blue area called “Application”; just right-click and select “Ungroup Application” and that will show you your routines. 

I hope this helps.

0 Kudos
Victor_L_2
Beginner
1,881 Views

Thanks! Can I get ITAC if I have Intel CPP Studio XE license ?

0 Kudos
Sumedh_N_Intel
Employee
1,881 Views

ITAC is provided with the Intel Cluster Studio XE 2013 (the latest version available here if you want to get it):   http://software.intel.com/en-us/intel-cluster-studio-xe/

This document should help you visualize how Intel packages the various products:  http://software.intel.com/en-us/articles/intel-tools-reference-guides-user-guides-bkms-getting-support

As you will be able to see,  Intel Parallel Sudio XE license does not cover the ITAC installation.  

0 Kudos
Victor_L_2
Beginner
1,881 Views

I was able to get and install ITAC on my host. But now I'm running into linker errors (command "icpc .....-mmic -pthread -tcollect):
x86_64-k1om-linux-ld: skipping incompatible /opt/intel/itac/8.1.2.033/intel64/itac/lib_impi4/libVT.a when searching for -lVT
x86_64-k1om-linux-ld: cannot find -lVT
x86_64-k1om-linux-ld: skipping incompatible /opt/intel/itac/8.1.2.033/intel64/itac/lib_impi4/libdwarf.a when searching for -ldwarf
x86_64-k1om-linux-ld: cannot find -ldwarf
x86_64-k1om-linux-ld: skipping incompatible /opt/intel/itac/8.1.2.033/intel64/itac/lib_impi4/libelf.a when searching for -lelf
x86_64-k1om-linux-ld: skipping incompatible /opt/intel/itac/8.1.2.033/intel64/itac/lib_impi4/libvtunwind.a when searching for -lvtunwind
x86_64-k1om-linux-ld: cannot find -lvtunwind

0 Kudos
Victor_L_2
Beginner
1,881 Views

P.S. to my earlier comment....

Since there is no file "<install_dir>/itac/<version>/bin/itacvars.sh" on my host as per instructions above, I've run command "source <install_dir>/itac/<version>/intel64/bin/itacvars.sh" instead.

0 Kudos
Sumedh_N_Intel
Employee
1,881 Views

Hi Victor, 

sorry for this delated response. To compile the application correctly, use the following compile line: 

[bash]mpiicpc -tcollect=VTcs -mmic hello.cpp [/bash]

The above command line uses VTcs instead of the default VT. This is the library designed to work with non-MPI programs. You will also need to manually initialize and finalize the collection, which normally happens in MPI_Init and MPI_Finalize. To do this, you need to add a call to 

To Initialize call:

[bash]int VT_initialize (int * argc, char *** argv)[/bash]

To Finalize call:

[bash] int VT_finalize(void)[/bash]

For the most complete collection, I recommend putting these at the very beginning and very end of the program. You will need to include VT.h, wherever you use these calls. 

Lastly, you will need to source the mic version of the tools. Hence, you will need to source <install_dir>/itac/<version>/mic/itacvars.sh 

Let me know if you have any more questions.

0 Kudos
Victor_L_2
Beginner
1,881 Views

Sumedh,

Thank you for instructions. I was able to compile my program without errors. I got it running on Phi (albeit it was running extremely slow). But in the middle of run the program terminated unexpectedly with the following messages:

[0] Intel(R) Trace Collector INFO: 46.06MB trace data in RAM + 454.00MB trace data flushed = 500.06MB total
[0] Intel(R) Trace Collector INFO: 26.88MB trace data in RAM + 973.25MB trace data flushed = 1000.12MB total
[0] Intel(R) Trace Collector INFO: 8.06MB trace data in RAM + 1492.12MB trace data flushed = 1500.19MB total
[0] Intel(R) Trace Collector INFO: 54.25MB trace data in RAM + 1946.00MB trace data flushed = 2000.25MB total
Killed

And it didn't produce any *.stf files. My executable runs fine when it's compiled without Trace Analyzer.

0 Kudos
Sumedh_N_Intel
Employee
1,881 Views

Hi Victor, 

Is the trace file being written to a shared filesystem or the coprocessor's filesystem? It is possible that the available space is being filled up. The flushed data is by default written to /tmp, which is normally not a shared filesystem. You can change this by setting VT_FLUSH_PREFIX to point to a shared filesystem.

0 Kudos
Victor_L_2
Beginner
1,881 Views

Hi Sumedh,

No files appeared on Phi local filesystem under /tmp or under /root. It is possible that Phi run out of memory. My program uses almost whole available RAM when it runs. How large *.stf files should I expect?

0 Kudos
Reply