Re: OMP vs. MPI

ash1 · ‎10-18-2009

Hello everyone,

My serial code currently takes about 16 seconds per time step. I'm looking at parallelizing my code using either openmp or mpi. From my readings, it seems that mpi can give in most cases better performance results, especially if I want to reduce the time to less than 1 second per time step. Would it be possible practically (not theoritically, Amdhal's) for openmp to decrease the time from 16 to 1 second from your experience.

Since MPI would take large effort starting from serial code, what resources you would recommend to have this done.

Thank you.

TimP · ‎10-18-2009

Amdahl's law is both a practical and a theoretical limit. If you're telling us your application is limited by Amdahl's law to less speedup than you desire, why ask this question?

ash1 · ‎10-18-2009

Quoting - tim18

Amdahl's law is both a practical and a theoretical limit. If you're telling us your application is limited by Amdahl's law to less speedup than you desire, why ask this question?

This is not true. There are many factors that makes Amdahl's theoritical limit not attainable.

My question is related to whether MPI could achieve 16/1 time ratio compared to OMP and why. I tried to use OMP and I got maximum of 2/1 time ratio with 6 processors. I understand that both OMP and MPI haveAmadhl's law as a theoritical limits, but I'm looking forthe practicalities built in MPI that cause it to outperform OMP in most cases.

In other words, why there is better chance in MPI to achieve close to 100% parallelism compared to OMP.

Thank you.

rreis · ‎10-19-2009

Quoting - ash1

This is not true. There are many factors that makes Amdahl's theoritical limit not attainable.

My question is related to whether MPI could achieve 16/1 time ratio compared to OMP and why. I tried to use OMP and I got maximum of 2/1 time ratio with 6 processors. I understand that both OMP and MPI haveAmadhl's law as a theoritical limits, but I'm looking forthe practicalities built in MPI that cause it to outperform OMP in most cases.

In other words, why there is better chance in MPI to achieve close to 100% parallelism compared to OMP.

Thank you.

Several factors ditate diferences between code performances. Maybe your OpenMP implementation is not good enough to get the desired speedup? MPI enforces you to explicitly think "local", OpenMP gives you more slack. Maybe thats key to your problem?

amit-amritkar · ‎10-19-2009

Quoting - rreis

Several factors ditate diferences between code performances. Maybe your OpenMP implementation is not good enough to get the desired speedup? MPI enforces you to explicitly think "local", OpenMP gives you more slack. Maybe thats key to your problem?

I agree with Ricardo, you mostly need adjustment with your OpenMP implementation.
OpenMP is easier to implement than MPI but then you should know that OpenMP is currently facing serious lack of locality which needs careful coding.
You'll find it useful (atleast I did) to use tools like PerfSuite to monitor the performance of the code and subsequently to identify the time consuming subroutines when you are doing OpenMP parallelization.

Izaak_Beekman · ‎10-19-2009

Quoting - Amit

I agree with Ricardo, you mostly need adjustment with your OpenMP implementation.
OpenMP is easier to implement than MPI but then you should know that OpenMP is currently facing serious lack of locality which needs careful coding.
You'll find it useful (atleast I did) to use tools like PerfSuite to monitor the performance of the code and subsequently to identify the time consuming subroutines when you are doing OpenMP parallelization.

Also, keep in mind that the scalability of your problem and the available hardware affects this choice too. While you can use MPI on SMP's you cannot use OpenMP to communicate between nodes on a distributed memorry system. If you're problem scales well and you forsee the problem to grow in size (domain, etc.) and have access to a distributed memorry system, MPI might be a better choice since it will run on shared memorry and distributed memory systems. Obviously data locality, and all the above mentioned criteria ar eimportant factors too.

jimdempseyatthecove · ‎10-19-2009

If you are experiencing 2x speed-up with 6 processors and OpenMP(assuming 6 processors were scheduled) then it is likely that there are other extenuating circumstances we have not been told.

Your code may have a memory bandwidth problem (e.g. there is little or no computation rather it is a memory copy function). In this circumstance using a system with higher memory interleaving and NUMA archetecture will improve performance. Going to OpenMPI will help _only_ if you can slice the working data set up _and_ keep the slices local within the systems they sliced to. If you have a lot of data movement then OpenMPI will be less effective.

Your parallelcode may be unintentionally passing through a serializing section. An example is a high frequency of malloc/free (new/delete), or most random number generators. If this is the situation then you need to use an alternative that does not require serializaton (e.g. multi-threadscalable allocatior and/or multi-thread random number generator).

Use a profiler to find out why your code scales so poorly.

Jim Dempsey