- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone,
My serial code currently takes about 16 seconds per time step. I'm looking at parallelizing my code using either openmp or mpi. From my readings, it seems that mpi can give in most cases better performance results, especially if I want to reduce the time to less than 1 second per time step. Would it be possible practically (not theoritically, Amdhal's) for openmp to decrease the time from 16 to 1 second from your experience.
Since MPI would take large effort starting from serial code, what resources you would recommend to have this done.
Thank you.
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Amdahl's law is both a practical and a theoretical limit. If you're telling us your application is limited by Amdahl's law to less speedup than you desire, why ask this question?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
Amdahl's law is both a practical and a theoretical limit. If you're telling us your application is limited by Amdahl's law to less speedup than you desire, why ask this question?
This is not true. There are many factors that makes Amdahl's theoritical limit not attainable.
My question is related to whether MPI could achieve 16/1 time ratio compared to OMP and why. I tried to use OMP and I got maximum of 2/1 time ratio with 6 processors. I understand that both OMP and MPI haveAmadhl's law as a theoritical limits, but I'm looking forthe practicalities built in MPI that cause it to outperform OMP in most cases.
In other words, why there is better chance in MPI to achieve close to 100% parallelism compared to OMP.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - ash1
This is not true. There are many factors that makes Amdahl's theoritical limit not attainable.
My question is related to whether MPI could achieve 16/1 time ratio compared to OMP and why. I tried to use OMP and I got maximum of 2/1 time ratio with 6 processors. I understand that both OMP and MPI haveAmadhl's law as a theoritical limits, but I'm looking forthe practicalities built in MPI that cause it to outperform OMP in most cases.
In other words, why there is better chance in MPI to achieve close to 100% parallelism compared to OMP.
Thank you.
Several factors ditate diferences between code performances. Maybe your OpenMP implementation is not good enough to get the desired speedup? MPI enforces you to explicitly think "local", OpenMP gives you more slack. Maybe thats key to your problem?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - rreis
Several factors ditate diferences between code performances. Maybe your OpenMP implementation is not good enough to get the desired speedup? MPI enforces you to explicitly think "local", OpenMP gives you more slack. Maybe thats key to your problem?
I agree with Ricardo, you mostly need adjustment with your OpenMP implementation.
OpenMP is easier to implement than MPI but then you should know that OpenMP is currently facing serious lack of locality which needs careful coding.
You'll find it useful (atleast I did) to use tools like PerfSuite to monitor the performance of the code and subsequently to identify the time consuming subroutines when you are doing OpenMP parallelization.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Amit
I agree with Ricardo, you mostly need adjustment with your OpenMP implementation.
OpenMP is easier to implement than MPI but then you should know that OpenMP is currently facing serious lack of locality which needs careful coding.
You'll find it useful (atleast I did) to use tools like PerfSuite to monitor the performance of the code and subsequently to identify the time consuming subroutines when you are doing OpenMP parallelization.
Also, keep in mind that the scalability of your problem and the available hardware affects this choice too. While you can use MPI on SMP's you cannot use OpenMP to communicate between nodes on a distributed memorry system. If you're problem scales well and you forsee the problem to grow in size (domain, etc.) and have access to a distributed memorry system, MPI might be a better choice since it will run on shared memorry and distributed memory systems. Obviously data locality, and all the above mentioned criteria ar eimportant factors too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are experiencing 2x speed-up with 6 processors and OpenMP(assuming 6 processors were scheduled) then it is likely that there are other extenuating circumstances we have not been told.
Your code may have a memory bandwidth problem (e.g. there is little or no computation rather it is a memory copy function). In this circumstance using a system with higher memory interleaving and NUMA archetecture will improve performance. Going to OpenMPI will help _only_ if you can slice the working data set up _and_ keep the slices local within the systems they sliced to. If you have a lot of data movement then OpenMPI will be less effective.
Your parallelcode may be unintentionally passing through a serializing section. An example is a high frequency of malloc/free (new/delete), or most random number generators. If this is the situation then you need to use an alternative that does not require serializaton (e.g. multi-threadscalable allocatior and/or multi-thread random number generator).
Use a profiler to find out why your code scales so poorly.
Jim Dempsey

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page