I have one Question, does the Intel Thread Profiler for Linux support nested OpenMP?
I compile my program with -openmp-profile, but when I start the program with 2 threads on the outer level and 4 threads on the inner leve, I just see 4 or 5 threads in the profiler view. But whan I look at op while the program is running, I see it running with 8 threads, what of course is what i expected to see. But I would also like to see these 8 Threads in the Profiler.
What happens when you have a simple test application without nested parallel regions using 8 threads without profiling, then run with profiling? Does the thread count diminish as it does for nested parallel regions?
If so, then the thead(s) used by the profilier are somehow being taken into consideration. This would be odd, the test would confirm this.
When you use nested levels you typically should not write (nor expect) to use all HW threads at all levels. Barring use of directives (num_threads(n) and/or nowait) the OpenMP scheduler may schedule an arbitrary number of threadsusing a best guess based on the then current availability of threads. i.e. the threads running the other sections of the outer level are busy at the time the inner level of your interests is scheduled to run and therefore a subset is taken.
When you know the execution times of each of the threads of the outer level are un-balanced (some finish much earlier than others) .AND. IF you want these threads to partake in an inner level of a different section of the outer level THEN make use of NOWAIT on the section that finishes first AND specify an appropriate num_threads() on the longer running nested level.
My problem is, when I compile the program with -openmp-profile, I get an guide.gvs file, and when I open this file and go to the threads view, then I do not see the 8 Threads.
I looked at a smaller example program and it looks like I just get the threads of the outer team. Is it possible to get the details for the inner teams?
I'm not sure (at all) that profiling tools works with nested parallelism.
I have same kind of issues with mixed MPI/openMp program.
Try a specific "Tools" forum for more information.
Let us know if you find answers... :=) :=):=)
In the case where multiple instances of OpenMP write guide.gvs into the same file system, if the execution completes normally, only the profiling results from a single OpenMP instance will remain. I have used it successfully this way, on the assumption that the instance which writes guide.gvs last is the one which determines critical path and thus the one of interest.
I don't know whether this gives any clues for the OMP_NESTED situation. It would appear that the results from the inner instances of OpenMP might be discarded, unless a way can be found to give each its own working directory.
Under linux, the same effect as building with openmp_profile may be achieved with the default shared OpenMP library link, by using LD_PRELOAD to substitute the profiling library at run time, but this is "unsupported." Profiling might be enabled for a single instance of OpenMP, if the LD_PRELOAD substitution could be restricted to that instance.
The Intel OpenMP profiling library, when linked into an MSVC build, gives fairly complete information, except that the identities of the parallel regions are "unknown." When linked into a gnu compiler, the overall OpenMP threading summary appears to be valid, but information on individual parallel regions is lost.
Current "supported" method is to run with openmp_profile linkage under Thread Profiler. Speaking for myself, I haven't had success with this method; the old methods continue to work for now (including Intel 11.0 and 11.1 compilers), although no longer "supported," and subject to change in a future major compiler release.
An Intel paper on thread profiling has been promised, but it is unlikely to cover nested parallelism.