Community
cancel
Showing results for 
Search instead for 
Did you mean: 
ds534486
Beginner
65 Views

nested OpenMP and Thread Profiler

Hello,
I have one Question, does the Intel Thread Profiler for Linux support nested OpenMP?
I compile my program with -openmp-profile, but when I start the program with 2 threads on the outer level and 4 threads on the inner leve, I just see 4 or 5 threads in the profiler view. But whan I look at op while the program is running, I see it running with 8 threads, what of course is what i expected to see. But I would also like to see these 8 Threads in the Profiler.

Best Regards,
Dirk

0 Kudos
7 Replies
jimdempseyatthecove
Black Belt
65 Views


Dirk,

What happens when you have a simple test application without nested parallel regions using 8 threads without profiling, then run with profiling? Does the thread count diminish as it does for nested parallel regions?

If so, then the thead(s) used by the profilier are somehow being taken into consideration. This would be odd, the test would confirm this.

Jim Dempsey
ds534486
Beginner
65 Views

Hello,

when I run an application without neseted regions, I see the results I expected. So when I run the application with 8 Threads, I see 8 Bars in the Threadview.

Dirk
jimdempseyatthecove
Black Belt
65 Views


When you use nested levels you typically should not write (nor expect) to use all HW threads at all levels. Barring use of directives (num_threads(n) and/or nowait) the OpenMP scheduler may schedule an arbitrary number of threadsusing a best guess based on the then current availability of threads. i.e. the threads running the other sections of the outer level are busy at the time the inner level of your interests is scheduled to run and therefore a subset is taken.

When you know the execution times of each of the threads of the outer level are un-balanced (some finish much earlier than others) .AND. IF you want these threads to partake in an inner level of a different section of the outer level THEN make use of NOWAIT on the section that finishes first AND specify an appropriate num_threads() on the longer running nested level.

Jim Dempsey


ds534486
Beginner
65 Views

The number of running threads is not my problem. I ran 2x4 Threads, so in the inner region there should be 8 threads running. The machine has 24 cores in total, so this is no problem. I see the threads running, when I use top and when I ask for the number of threads with omp_get_num_threads(), I get 4 Threads for every inner team. So I think the threads are running like expected.

My problem is, when I compile the program with -openmp-profile, I get an guide.gvs file, and when I open this file and go to the threads view, then I do not see the 8 Threads.

I looked at a smaller example program and it looks like I just get the threads of the outer team. Is it possible to get the details for the inner teams?

Best Regards,
Dirk
Alain_D_Intel
Employee
65 Views

Quoting - ds534486
The number of running threads is not my problem. I ran 2x4 Threads, so in the inner region there should be 8 threads running. The machine has 24 cores in total, so this is no problem. I see the threads running, when I use top and when I ask for the number of threads with omp_get_num_threads(), I get 4 Threads for every inner team. So I think the threads are running like expected.

My problem is, when I compile the program with -openmp-profile, I get an guide.gvs file, and when I open this file and go to the threads view, then I do not see the 8 Threads.

I looked at a smaller example program and it looks like I just get the threads of the outer team. Is it possible to get the details for the inner teams?

Best Regards,
Dirk


Hello Dirk,

I'm not sure (at all) that profiling tools works with nested parallelism.
I have same kind of issues with mixed MPI/openMp program.
Try a specific "Tools" forum for more information.
Let us know if you find answers... :=) :=):=)

Cheers.

TimP
Black Belt
65 Views

For the case of MPI FUNNELED OpenMP usage (but not OMP_NESTED), in principle, if each MPI process is connected to its own file system, openmp_profile may be used to save the guide.gvs for each instance of OpenMP.
In the case where multiple instances of OpenMP write guide.gvs into the same file system, if the execution completes normally, only the profiling results from a single OpenMP instance will remain. I have used it successfully this way, on the assumption that the instance which writes guide.gvs last is the one which determines critical path and thus the one of interest.
I don't know whether this gives any clues for the OMP_NESTED situation. It would appear that the results from the inner instances of OpenMP might be discarded, unless a way can be found to give each its own working directory.
Under linux, the same effect as building with openmp_profile may be achieved with the default shared OpenMP library link, by using LD_PRELOAD to substitute the profiling library at run time, but this is "unsupported." Profiling might be enabled for a single instance of OpenMP, if the LD_PRELOAD substitution could be restricted to that instance.
The Intel OpenMP profiling library, when linked into an MSVC build, gives fairly complete information, except that the identities of the parallel regions are "unknown." When linked into a gnu compiler, the overall OpenMP threading summary appears to be valid, but information on individual parallel regions is lost.
Current "supported" method is to run with openmp_profile linkage under Thread Profiler. Speaking for myself, I haven't had success with this method; the old methods continue to work for now (including Intel 11.0 and 11.1 compilers), although no longer "supported," and subject to change in a future major compiler release.
An Intel paper on thread profiling has been promised, but it is unlikely to cover nested parallelism.
jimdempseyatthecove
Black Belt
65 Views


Dirk,

If timer based sampling is suitable you might try AMD's Code Analyst.

Jim Dempsey
Reply