Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2154 Discussions

Distributing a single OpenMP task on multiple cores

rincent__antoine
Beginner
671 Views

Hi! I've tried reading up on this forum and throughout the documentation, without success and was hoping you could help me. I'm trying to run a software called LAMMPS on my personal computer (using an AMD Ryzen 7 1700X processor, could that be the source of my issues?), my goal is to make use of its 8 physical cores to accelerate the execution time of a single LAMMPS instance.

I use a function in the software's input script which sets the number of OpenMP threads for each MPI process to 1, as increasing this value drastically slows down the software. This is how I call the mpiexec function : mpiexec -ppn 1 -n 8 -localonly -genv OMP_NUM_THREADS 8 lmp_mpi -in Morse_Temp_1.in

I use the 2019 version of Intel Parallel Studio XE with a student license, if it can help :)

No matter what combination of values I put for -n or -ppn, my LAMMPS executable will either be run only once, using a single processor, or it'll be run 8 times in parallel, as in running 8 times the same calculations, effectively doing the same result (and performance) as a single processor, but 8 times simultaneously. What I'd want would be for the LAMMPS task to be divided across the 8 cores/16 threads available.

Thanks a lot :)

0 Kudos
2 Replies
Anatoliy_R_Intel
Employee
671 Views

Hello,

It is seems something wrong in LAMMPS application. May be you should provide some options to this application to work in parallel?

You can run `mpiexec -ppn 1 -n 8 -localonly test.exe`, where test.exe is a simple MPI Hello World and you will see 8 launched ranks as you wanted. 

--

Best regards, Anatoliy.

0 Kudos
jimdempseyatthecove
Honored Contributor III
671 Views

>>(using an AMD Ryzen 7 1700X processor...its 8 physical cores
>>my goal is to make use of its 8 physical cores to accelerate the execution time of a single LAMMPS instance
>>mpiexec -ppn 1 -n 8 -localonly -genv OMP_NUM_THREADS 8

Your "cluster" consists of a single host (with 8 cores, 16 threads)
The -n 8 specifies the total number of processes to create on the available nodes (a host may have one or more nodes, typically this can be the number of sockets).
The -ppn 1 specifies 1 process per node.
The  OMP_NUM_THREADS 8 says, that each process created is to use 8 OpenMP threads... *** within the process constricted affinities or physical limit.

Think of the mpiexec shell as being a card dealer in a card game in which -n 8 is the total number of cards in the deck.
Think of the number of available nodes as the number of players
Think of the -ppn 1 as the number of cards to be dealt at a time to each player on this round of the round robin..
The cards are then dealt in round robin manner until all cards are dealt.
Once each player (node) receives all the cards (processes) for the current play, the total number of available threads on the host are partitioned amongst each process (there are advanced ways to control this, ignore this for the moment).

Note, "partitioned amongst each process" means should the node contain more than one process (copy of application), then each process is affinity restricted to a different subset of the available hardware threads (typically all available hardware threads but not always).

deal   node
1of8  0  (you have only one node)
2of8  0
3of8  0
..
8of8 0

Node 0 now has 8 processes. The 16 hardware threads are affinity partitioned to each process such that each process has 2 hardware threads to work with.

Now then, with the (each) process constricted to 2 hardware threads, each process will instantiate and work with 8 OpenMP threads.

IOW each process has 4x the number of software threads as it does have hardware threads.

For your system, likely candidates:

mpiexec -ppn 1 -n 8 -localonly -genv OMP_NUM_THREADS 2 lmp_mpi -in Morse_Temp_1.in
mpiexec -ppn 1 -n 4 -localonly -genv OMP_NUM_THREADS 4 lmp_mpi -in Morse_Temp_1.in
mpiexec -ppn 1 -n 2 -localonly -genv OMP_NUM_THREADS 8 lmp_mpi -in Morse_Temp_1.in
or (same as for your "cluster")
mpiexec -n 8 -localonly -genv OMP_NUM_THREADS 2 lmp_mpi -in Morse_Temp_1.in
mpiexec -n 4 -localonly -genv OMP_NUM_THREADS 4 lmp_mpi -in Morse_Temp_1.in
mpiexec -n 2 -localonly -genv OMP_NUM_THREADS 8 lmp_mpi -in Morse_Temp_1.in
or
remove OMP_NUM_THREADS and run the non-mpi OpenMP variant of LAMMPS

Jim Dempsey

0 Kudos
Reply