Community
cancel
Showing results for 
Search instead for 
Did you mean: 
ymost
New Contributor I
114 Views

What are all those functions starting with "kmp_" ?

Hi,

I'm using VTune to analyze a heavy application that utilizes OpenMP. I've used the sampling and call-graph collectors, but I'm having a hard time understanding the output. It seems that the most time consuming functions are functions that are not really part of the application: kmp_launch_thread, kmp_fork_barrier, kmp_wait_sleep, kmp_suspend, kmpc_invoke_task_func, kmp_invoke_microtask. From the names of these functions I'm guessing they are internal functions used by the OpenMP implementation, but in the call-graph most of them are disconnected from the real functions of the application. How can I optimize my application if I don't know where the computation time is spent?

I will appriciate any helpful suggestions,
thanks.
0 Kudos
3 Replies
TimP
Black Belt
114 Views

If you run against the profiling version of your Intel OpenMP library, it will assess several categories of time spent in each parallel section. If you used ifort or icc, the parallel sections will be marked according to the corresponding region in source code. Thread Profiler should substitute the profiling library automatically, if you used the default dynamic link. You can run without Thread Profiler and view the text file summary, but that no longer is "supported," although /Qopenmp-profile remains as an option to force the profiling library link.
If you are spending more time starting up threaded regions than executing them, the regions may be too small for the number of threads you chose. Excessive time spent in some of those functions would show up as work imbalance; you will see how much time each thread spends running or waiting.
The profiling library evidently adds overhead, so it will increase the overhead significantly if your parallel regions are short.
kmp apparently stands for Kuck multi-processing, as David Kuck founded the company which originally supported this OpenMP.
ymost
New Contributor I
114 Views

Quoting - tim18
If you run against the profiling version of your Intel OpenMP library, it will assess several categories of time spent in each parallel section. If you used ifort or icc, the parallel sections will be marked according to the corresponding region in source code. Thread Profiler should substitute the profiling library automatically, if you used the default dynamic link. You can run without Thread Profiler and view the text file summary, but that no longer is "supported," although /Qopenmp-profile remains as an option to force the profiling library link.
If you are spending more time starting up threaded regions than executing them, the regions may be too small for the number of threads you chose. Excessive time spent in some of those functions would show up as work imbalance; you will see how much time each thread spends running or waiting.
The profiling library evidently adds overhead, so it will increase the overhead significantly if your parallel regions are short.
kmp apparently stands for Kuck multi-processing, as David Kuck founded the company which originally supported this OpenMP.

Thank you for your reply.
Indeed I am using ifort, with the flag -openmp-profile. However, I don't see how the parallel sections are marked according to the corresponding region in the source code - where is this marking?
I would have gladly used Thread Profiler, but unfortunately it is available only for windows, and I am working on a linux machine. I do have the text file summary (I assume you mean "guide.gvs"), but it is of little use to me. It has very little information, and it is only a general summary of the running times of different threads, with no details to help me figure out where in the source code the problems reside.
I should also say that for the purpose of profiling I am using an exemplar of my application that has a very short runtime, so the overhead may seem bigger than it will actually be.
TimP
Black Belt
114 Views


===============================================================================
                     Intel Thread Profiler 3.1 for Linux* 

                                  Release Notes
===============================================================================

Contents
--------

  - Overview
  - Product Contents
  - What's New
  - System Requirements
  - Known Issues and Limitations
  - Technical Support
  - Related Products

Overview
--------

Intel Thread Profiler for Linux* is a performance tuning tool for parallel 
programs that use POSIX* or OpenMP* or custom synchronization. 


......
I agree, that if your test doesn't put any results for your parallel regions into guide.gvs,
it may not be worth while to install thread profiler.


You may be thinking of the ability to make plots from guide.gvs, which is not present in linux VTune, only in the
Windows version.
For an ifort compilation, the parallel regions in guide.gvs are identified by the source line function and beginning line number,
if they are executed.

Reply