- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am currently trying to parallelize a time-dependent (FORTRAN) code that basically consists of several loops and DGEMM calls, e.g:
DO time=1,endtime
DO i=1,end
(calculations)
END DO
CALL DGEMM ( )
CALL DGEMM ( )
DO i=1,end
(calculations)
END DO
END DO
I am wondering if someone can offer some advice on how to parallelize this piece of code that will make the most usage of the parallalization that is already built in to the matrix multiply routines (DGEMM). Essentially, I would like to do something like this:
DO time=1,endtime
!$OMP PARALLEL
!$OMP DO
DO i=1,end
(calculations)
END DO
!$OMP END DO
CALL DGEMM ( )
CALL DGEMM ( )
!$OMP DO
DO i=1,end
(calculations)
END DO
!$OMP END DO
!$OMP END PARALLEL
END DO
However, I am not certain on what to do with the section of code that contains the DGEMM calls in terms of OpenMP directives. Should I just have one thread execute this section, or is there a better way to exploit the parallelism of the DGEMM routines within OpenMP. Does anyone have some advice on this?
Thanks,
Mandrew
I am currently trying to parallelize a time-dependent (FORTRAN) code that basically consists of several loops and DGEMM calls, e.g:
DO time=1,endtime
DO i=1,end
(calculations)
END DO
CALL DGEMM ( )
CALL DGEMM ( )
DO i=1,end
(calculations)
END DO
END DO
I am wondering if someone can offer some advice on how to parallelize this piece of code that will make the most usage of the parallalization that is already built in to the matrix multiply routines (DGEMM). Essentially, I would like to do something like this:
DO time=1,endtime
!$OMP PARALLEL
!$OMP DO
DO i=1,end
(calculations)
END DO
!$OMP END DO
CALL DGEMM ( )
CALL DGEMM ( )
!$OMP DO
DO i=1,end
(calculations)
END DO
!$OMP END DO
!$OMP END PARALLEL
END DO
However, I am not certain on what to do with the section of code that contains the DGEMM calls in terms of OpenMP directives. Should I just have one thread execute this section, or is there a better way to exploit the parallelism of the DGEMM routines within OpenMP. Does anyone have some advice on this?
Thanks,
Mandrew
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I guess you would set OMP_NESTED, or terminate your PARALLEL before DGEMM. and link the mkl_thread library if you intend DGEMM to start its own threads. If the DGEMM invocations are separate and about equal in time consumption, you could put them in separate OMP SECTIONs (a usage I haven't seen). If you don't need DGEMM to be in your single parallel region, I doubt you would lose by 2 separate parallel regions, allowing DGEMM to use the team of threads which persists from your first parallel loop, and then your 2nd parallel region would take back the same thread team.
You could learn more and help us give advice if you would link the OpenMP profiling library and show the profiling result.
You could learn more and help us give advice if you would link the OpenMP profiling library and show the profiling result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
I guess you would set OMP_NESTED, or terminate your PARALLEL before DGEMM. and link the mkl_thread library if you intend DGEMM to start its own threads. If the DGEMM invocations are separate and about equal in time consumption, you could put them in separate OMP SECTIONs (a usage I haven't seen). If you don't need DGEMM to be in your single parallel region, I doubt you would lose by 2 separate parallel regions, allowing DGEMM to use the team of threads which persists from your first parallel loop, and then your 2nd parallel region would take back the same thread team.
You could learn more and help us give advice if you would link the OpenMP profiling library and show the profiling result.
You could learn more and help us give advice if you would link the OpenMP profiling library and show the profiling result.
Thanks for the advice. I am not familiar with the OpenMP profiling library - is this discussed in the intel compiler documentation?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should find the openmp-profile link option mentioned in Intel compiler docs. Beyond that, I don't find the documentation adequate.
On linux, if you did a default .so link to the OpenMP library, you can use LD_PRELOAD to substitute the profiling library without re-linking.
If you simply run normally with the profiling library, it writes to a file named guide.gvs, which you can read with a text reader or plot in Windows VTune. It displays performance statistics on each parallel region in code compiled by Intel compiler or in MKL.
All this is said to be subject to change next year, in case that may explain sketchy documentation.
On linux, if you did a default .so link to the OpenMP library, you can use LD_PRELOAD to substitute the profiling library without re-linking.
If you simply run normally with the profiling library, it writes to a file named guide.gvs, which you can read with a text reader or plot in Windows VTune. It displays performance statistics on each parallel region in code compiled by Intel compiler or in MKL.
All this is said to be subject to change next year, in case that may explain sketchy documentation.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page