OpenMP,MPI and dual/quad core processors

Wee_Beng_T_ · ‎12-05-2007

Hi,

I've some confusion regarding OpenMP,MPI and dual/quad core processors. Hence, I hope experts out there can help me answer...Btw, I'm using intel fortran to write my own CFD code.

1. If I've a dual/quad core processor, does it mean that using OpenMP or MPI on my code can make it run faster? e.g. thru using multiple threads during looping.

2. Is OpenMP better (faster/easier) than MPI on multiple core processors, esp if it's only a dual or quad core on a single chip? Ifort has direct support of OpenMP and there's also an auto-parallizing feature to speed up my code, is that so?

Thank you very much

onkelhotte · ‎12-06-2007

I have not much experience with OpenMP... But I can share that I have :-)

We just migrated from CVF to IVF and therefore I tried to tweak my current project (see my other thread). Auto parallizing brought a lot of performance improvements without editing the code. OpenMP gave me additional10% on my 1 core CPU! Now my project needs only 50% of the former CVF calculation time.

Auto parallization and OpenMP use threads for your do loops for example. So you dont have to rewrite your code or have to worry about race conditions or dead locks. Its a simple and effective wayadding multithreading to your app.

But Ive had some trouble with OpenMP. In my project there are a lot of old Fortran subroutines and functions my boss coded decades ago. He is (present tense!) a big friend of goto and local variables that were not saved but they "remembered" their value in later calls. So I had to fix "some" errors from the past.

Markus

TimP · ‎12-06-2007

CFD codes generally are prime candidates for parallelization, according to the 20 year old slogan "parallel outer, vector inner" (with 2 or more levels of loops).
http://books.google.com/books?id=2z5ipEZQOZEC&pg=RA1-PA134&lpg=RA1-PA134&dq=concurrent+outer+vector+inner&source=web&ots=XIha1ermV9&sig=Du7YYfO2vzsfhRj3GCqFjfq8kZI#PRA1-PA133,M1
An easy route may be to start with auto-parallelization to confirm opportunities, and refine with OpenMP. MPI can give further scaling, at the cost of greater development effort. It isn't usually undertaken for a single node. See
http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=11387

jimdempseyatthecove · ‎12-06-2007

Quarkz,

If you have a single application (execuitable) that runs on one system then use OpenMP. MPI is a message passing scheme beteween processes, each process have a different virtual address domain and which may be distributed across multiple processors be they in a SMP, cluster, or even distributed across a LAN orthe internet.

If your applicationruns on asingle system (single, dual or quad core) and you have a 32-bit application and where the maximum application size (2 or 3GB) is smaller than physical memory size, then you could concievabley use MPI between processes on the same system. MPI can also be used on the same system effectively when your old legacy code is not suitable for use with OpenMP (i.e. not thread friendly). A code wrapper using MPI may be more effective than a Batch file to launch and coordinatemultiple processes.

For single system, single execuitable, then consider OpenMP.

Also, if your code is pre F90/F95, reexamine your data flow to see if you can make better use of pointers or references to pass data instead of the common practice of block moving data into a work space, calling the subroutine, moving block data out of working space.

Jim Dempsey

Wee_Beng_T_ · ‎12-07-2007

Hi thank you all. I think I'll try OpenMP. Btw, my code is F95. However, when JimDempseyAtTheCove mention

"reexamine your data flow to see if you can make better use of pointers or references to pass data instead of the common practice of block moving data into a work space, calling the subroutine, moving block data out of working space."

Does it mean when I call a subroutine e.g. cal_vec(x,y) where

subroutine cal_vec(x,y)

real, intent(in) :: x(1000)

real, intent(out) :: y(1000)

y=200.*x

end subroutine

Is this what you mean by moving data in/out and it's not recommended? I've quite a lot of subroutines and I wonder if it's inefficient. Is there a place where I can find more detail explanation of what you are referring to?

Thanks!

jimdempseyatthecove · ‎12-07-2007

No, that call does not move data.

What I mean is during the evolution of your application it may have been starved for memory and you may have initially developed the application to read in data from a file into common arrays, process the data, then write it out. Then later when you moved on to larger memory systems and with a few enhancements you may have enlarged the number of items (objects) on which the old program would process. Your modified old code may have looked like

CALL READBUNCHODATA
DO I=1, NOBJECTS
CALL COPY2WORKSPACE(I)
CALL OLDPROCESSDATA
CALL COPYFROMWORKSPACE(I)
END DO
CALL WRITEBUNCHODATA

Where OLDPROCESSDATA is your old code that was written to work on only one object in memory in a fixed COMMON data area. If you take the time to modify OLDPROCESSDATA such that it is passed a reference (or pointer) to the object data instead of working in COMMON blocks then the program will run much faster. Depending on the circumstances fixing the code may have dramatic performance improvements. Some old code I modified a couple of years back had a 10x performance improvement _prior_ to making use of OpenMP.

Using VTune or other profiler first may locate sections of your code that are copying data. If you can rework that code to use a reference or pointer instead of a predefined area in COMMON you might find a good candidate for improvement.

Jim Dempsey