- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to profile my code using the command line version of VTune (version 2013 update 17) on Linux. I need to get the hot loops and not just the hot functions. The report when using -loop-mode=loop-only or -loop-mode=loop-and-function shows most of the runtime as [Outside any loop] in the module [unknown]. There are also some loops from other libraries. For example:
Function Function Stack Module CPU Time:Self -------------------------------- ------------------ -------------------- ------------- [Outside any loop] [Unknown] 0.029 parse_line libnss_files-2.12.so 0.004 _nss_files_parse_pwent libc-2.12.so 0.004 ____strtoull_l_internal libc-2.12.so 0.003 func@0x3eba4ae460 libcrypto.so.1.0.1e 0.001 __strchr_sse2 libc-2.12.so 0.001 _nss_files_getpwuid_r libnss_files-2.12.so 0.001 [Loop@0x4a9010 in func@0x4a9010] bash 0.001 [Outside any loop] [Unknown] 0.001
I have also tested the same code with the 2015 GUI version of VTune on my laptop (Windows), and I can see the loops with line numbers, so I am not sure if my problem is related to the older version or the fact that I am using command line. How can I see the individual loop line numbers with 2013 amplxe-cl?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Diana:
It would seem that you are missing debug info. Are any of your modules (i.e., modules that you developed/built) listed in the results? I see some libraries, but they appear to be system or standard libraries. Is your binary built with debug info and not stripped (execute 'file <executable>' to determine if it is stripped of debug info or not).
Also, what does the report look like without the "loop-mode" option? What module and function has the most time?
Finally, if you use the tachyon sampling included in the product (see <installation-directory>/samples/en/C++), you should be able to build and test and see what it *should* look like. For example, here is the output using -loop-mode=loop-only:
Function Module CPU Time:Self CPU Time:Idle:Self CPU Time:Poor:Self CPU Time:Ok:Self CPU Time:Ideal:Self CPU Time:Over:Self Overhead Time:Self Spin Time:Self ------------------------------------------------------- ------------------------- ------------- ------------------ ------------------ ---------------- ------------------- ------------------ ------------------ -------------- [Loop at line 580 in grid_intersect] tachyon_find_hotspots.bak 4.980 0 2.410 0.130 2.440 0 0 0 [Loop at line 144 in render_one_pixel] tachyon_find_hotspots.bak 0.951 0 0.180 0.030 0.741 0 0 0 [Outside any loop] [Unknown] 0.900 0.460 0.440 0 0 0 0 0.890 [Loop at line 559 in grid_intersect] tachyon_find_hotspots.bak 0.320 0 0.170 0.020 0.130 0 0 0 [Loop@0x3560e1ec69 in __libc_start_main] libc-2.12.so 0.270 0.020 0.220 0.030 0 0 0 0.260 [Loop at line 561 in grid_intersect] tachyon_find_hotspots.bak 0.229 0 0.110 0 0.119 0 0 0 [Loop at line 634 in grid_intersect] tachyon_find_hotspots.bak 0.200 0 0.130 0.010 0.060 0 0 0 [Loop at line 113 in intersect_objects] tachyon_find_hotspots.bak 0.140 0 0.030 0 0.110 0 0 0 [Loop at line 111 in shader] tachyon_find_hotspots.bak 0.070 0 0.020 0 0.050 0 0 0 [Loop at line 178 in tachyon_video::on_process] tachyon_find_hotspots.bak 0.050 0.050 0 0 0 0 0 0 [Loop at line 202 in thread_trace$omp$parallel_for@197] tachyon_find_hotspots.bak 0.030 0 0.010 0.010 0.010 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi MrAnderson,
Thanks for your help.
I have recompiled the example code with -g:
icc -g testapp.c -o testapp.x
When I profile this on Xeon, I do see the correct output for the loop profile:
amplxe-cl -collect advanced-hotspots -r test1 ./testapp.x amplxe-cl -R callstacks -r test1 -loop-mode=loop-only Function Function Stack Module CPU Time:Self -------------------------- -------------------------- --------- ------------- [Loop at line 37 in func2] testapp.x 2.566 [Loop at line 36 in func2] testapp.x 2.566 [Loop at line 34 in func2] testapp.x 0 [Outside any loop] [Unknown] 0 [Loop at line 36 in func2] testapp.x 0.012 [Loop at line 34 in func2] testapp.x 0.012 [Outside any loop] [Unknown] 0 [Outside any loop] [Unknown] 0.005 [Loop at line 14 in func1] testapp.x 0.004 [Loop at line 12 in func1] testapp.x 0.004 [Outside any loop] [Unknown] 0
However, when I try to run the same code on Xeon Phi (native mode), I am still seeing [Outside any loop] as the main/only contributor.
icc -g -mmic testapp.c -o testapp.mic
amplxe-cl -collect knc-hotspots -r test1 -- ssh mic0 /tmp/testapp.mic amplxe-cl -R callstacks -r test1 -loop-mode=loop-only Function Function Stack Module CPU Time:Self ------------------ -------------- --------- ------------- [Outside any loop] [Unknown] 91.647
The function-only report is:
Function Module CPU Time:Self ------------------------- ----------------------- ------------- [testapp.mic] testapp.mic 46.175 [vmlinux] vmlinux 44.653 [sep3_15] sep3_15 0.385 [libc-2.14.90.so] libc-2.14.90.so 0.296 [libcrypto.so.1.0.0] libcrypto.so.1.0.0 0.067 [libnss_files-2.14.90.so] libnss_files-2.14.90.so 0.040 [micscif] micscif 0.007 [coi_daemon] coi_daemon 0.005 [ld-2.14.90.so] ld-2.14.90.so 0.005 [libpthread-2.14.90.so] libpthread-2.14.90.so 0.005 [dma_module] dma_module 0.002 [libmessage_mic.so] libmessage_mic.so 0.002 [libpam.so.0.83.1] libpam.so.0.83.1 0.002 [sep_mic_server3.15] sep_mic_server3.15 0.002
I am not sure why the Xeon Phi version treats the entire program as one function. Adding -fno-inline did not seem to make a difference either.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Diana:
I apologize for the delay. I was checking with the team to make sure I was accurate in my understanding of the issues.
Currently, there are several issues blocking loop analysis on Xeon Phi systems. It is a combination of incorrect debug information and the algorithm used to analyze the info. We expect a fix for both shortly. I will try to remember to update this thread when the fixes are available. Until then, you will be limited to function-hotspot analysis on Xeon Phi.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FYI, VTune Amplifier XE 2015 Update 1, which was released about two weeks ago, addresses the algorithm part of this behavior. It should improve the behavior.
I don't have any info on changes to the debug info, but will look into it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for getting back to me. I will look into whether we can use the 2015 version on the Xeon Phi system.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Diana G. wrote:
[testapp.mic] testapp.mic 46.175
Function name in brackets like [testapp.mic] can indicate that testapp.mic module was not found during finalization. There should be a message about testapp.mic in finalization output if that is the case. I would suggest to try to specify the search path for this module and refinalize the result: amplxe-cl -finalize -r test1 -search-dir=<path to directory where testapp.mic is located>.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page