- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm only able to get CPU time, but I need elapsed / wallclock time on each function. The applications is parallelized using OpenMP.
It will be perfect if it can be obtained through command line interface.
regards,
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A step in that direction is to filter by threads and attempt to find the views showing the maximum time in each function. I'd hate to attempt that from command line.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me understand the need better. Are the functions you are going to measure called inside a parallel region in OpenMP working threads or it is more like global application phases?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Second scenario: global application phases.
void phase( void ) { struct timeval start, end; gettimeofday( &start ); pre_process(); #pragma omp parallel { /* here it goes the main work */ } post_process(); gettimeofday( &end ); }
So, I'm able to get the aggregated time threads have spent at the parallel region (CPU time) but what I need is end - start.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is Frame ITT API https://software.intel.com/en-us/node/496605 that VTune supports - it allows you to generate frames and then using Frame based groupings in grid explore elapsed time and other frame info. Frame is essentially a global time region with begin, end and name.
Also please note that starting in VTune Amplifier XE 2015 and Intel Compiler 14 and later you can have automated annotations for OpenMP regions and explore on OpenMP efficiency as it describled here: https://software.intel.com/en-us/node/529272
Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think I started asking for built-in instrumentation of OpenMP parallel regions before the standard was even officially launched in 1997. :-(
Instead of waiting, I decided to just get in the habit of building my own. Inside that parallel region, every thread begins by reading a timer (typically RDTSC on recent processors with the constant_tsc attribute, but gettimeofday() is fine on most systems) and saving it in a "start time" array (indexed by thread number). When each thread is finished with its work (but before it enters any implicit barriers), it reads the timer again and saves it in an "end time" array. Then I can look at the variation in start times, the variation in elapsed times, the variation in end time, etc.
Mildly labor-intensive, but valuable -- and once you have done it once, it is pretty easy to get in the habit of including such instrumentation any time you write OpenMP code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In general speaking, you can get performance data thread by thread. For example:
#amplxe-cl collect concurrency -- ./program
#amplxe-cl -report hotspots -group-by thread
Thus, all performance data on threads will be displayed.
If you want to know CPU time for specific OpenMP* region, simply use VTune's pause/resume API before/after OpenMP code region. Thus, elapsed time is from first thread's creation and last thread's termination.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want to know appregated CPU time for specific function (pthread or winthread) which is used by many threads (as entry function), you can insert resume api before first thread' creation and put pause api after last thread's termination. Elapsed time is what you want - specific function's life time in threads. (Note: you may start VTune in start-paused mode)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for all the replies but for me it seems impossible to do it easily with Vtune.
At the end, I've chosen Extrae/Paraver that use dynist to automatically instrument entries and exits for a given set of functions. Then it is easy to extract elapsed time and hardware counters.
Thanks again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was going to recommend compiling with the "profile-functions" option, but looking at the documentation for the Intel 14 compiler I noticed that it says
This option inserts instrumentation calls at a function's entry and exit points within a single-threaded application to collect the cycles spent within the function to produce reports that can help in identifying code hotspots.
I have not tested this to see if it is ignored when compiling for an OpenMP target.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page