Re: Using OpenMP and DGEMM

mandrew · ‎09-02-2009

Hello,

I am currently trying to parallelize a time-dependent (FORTRAN) code that basically consists of several loops and DGEMM calls, e.g:

DO time=1,endtime

DO i=1,end
(calculations)
END DO

CALL DGEMM ( )
CALL DGEMM ( )

DO i=1,end
(calculations)
END DO

END DO

I am wondering if someone can offer some advice on how to parallelize this piece of code that will make the most usage of the parallalization that is already built in to the matrix multiply routines (DGEMM). Essentially, I would like to do something like this:

DO time=1,endtime

!$OMP PARALLEL

!$OMP DO
DO i=1,end
(calculations)
END DO
!$OMP END DO

CALL DGEMM ( )
CALL DGEMM ( )

!$OMP DO
DO i=1,end
(calculations)
END DO
!$OMP END DO

!$OMP END PARALLEL

END DO

However, I am not certain on what to do with the section of code that contains the DGEMM calls in terms of OpenMP directives. Should I just have one thread execute this section, or is there a better way to exploit the parallelism of the DGEMM routines within OpenMP. Does anyone have some advice on this?

Thanks,

Mandrew

TimP · ‎09-02-2009

I guess you would set OMP_NESTED, or terminate your PARALLEL before DGEMM. and link the mkl_thread library if you intend DGEMM to start its own threads. If the DGEMM invocations are separate and about equal in time consumption, you could put them in separate OMP SECTIONs (a usage I haven't seen). If you don't need DGEMM to be in your single parallel region, I doubt you would lose by 2 separate parallel regions, allowing DGEMM to use the team of threads which persists from your first parallel loop, and then your 2nd parallel region would take back the same thread team.
You could learn more and help us give advice if you would link the OpenMP profiling library and show the profiling result.

mandrew · ‎09-02-2009

Quoting - tim18

I guess you would set OMP_NESTED, or terminate your PARALLEL before DGEMM. and link the mkl_thread library if you intend DGEMM to start its own threads. If the DGEMM invocations are separate and about equal in time consumption, you could put them in separate OMP SECTIONs (a usage I haven't seen). If you don't need DGEMM to be in your single parallel region, I doubt you would lose by 2 separate parallel regions, allowing DGEMM to use the team of threads which persists from your first parallel loop, and then your 2nd parallel region would take back the same thread team.
You could learn more and help us give advice if you would link the OpenMP profiling library and show the profiling result.

Thanks for the advice. I am not familiar with the OpenMP profiling library - is this discussed in the intel compiler documentation?

TimP · ‎09-02-2009

You should find the openmp-profile link option mentioned in Intel compiler docs. Beyond that, I don't find the documentation adequate.
On linux, if you did a default .so link to the OpenMP library, you can use LD_PRELOAD to substitute the profiling library without re-linking.
If you simply run normally with the profiling library, it writes to a file named guide.gvs, which you can read with a text reader or plot in Windows VTune. It displays performance statistics on each parallel region in code compiled by Intel compiler or in MKL.
All this is said to be subject to change next year, in case that may explain sketchy documentation.