Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Distribute processes across nodes

Vishnu
Novice
1,847 Views
Hi! I'm trying to run a software called LAMMPS, across nodes. As recommended in it's page I'm using 2 OpenMP threads and enough MPI processes to fill all cores: https://lammps.sandia.gov/doc/Speed_intel.html This is fine if I'm running it on one node. When I use, say 4 nodes, the process uses only 2 nodes. How do I distribute it across nodes? This is what I've tried: mpirun -machinefile $PBS_NODEFILE -n 64 -ppn 16 \ -genv OMP_NUM_THREADS=2 -genv I_MPI_PIN_DOMAIN=omp \ lmp -in in.lammps -suffix hybrid intel omp -package intel 0 omp 2 There are 32 cores per node, and so I'm trying to assign 16 MPI processes per node, so each may spawn 2 OMP threads. And `lmp` is the LAMMPS executable. What am I doing wrong?
0 Kudos
2 Replies
Yury_K_Intel
Employee
1,847 Views

Hello Vishnu,

-machinefile option should not be used together with -n -ppn options. Please replace -machinefile to -f in your command line and try again.

There are 2 ways to set process placement across the nodes:

1. using -machinefile, where you specify the <node_name>:<num_processes> in each line of machine file

or

2. -f <hostfile> which contains a list of nodes. And using -n you specify a total number of MPI processes and -ppn to specify a number of  processes per node.

--

Best regards, Yury

0 Kudos
Vishnu
Novice
1,847 Views

Hey Yury!

Before you replied, I tried this, and it works for me, even with the machinefile:

#!/bin/bash
#PBS -l select=4:ncpus=32:mpiprocs=16
#PBS -N bench
#PBS -q cpuq

NODES=4

cd $PBS_O_WORKDIR

mpirun -machinefile $PBS_NODEFILE -n $((16*${NODES})) -ppn 16 \
    -genv OMP_NUM_THREADS=2 -genv I_MPI_PIN_DOMAIN=omp \
    lmp -in in.lammps -suffix intel -package intel 0 omp 2 

The primary difference being that earlier, I was using the following PBS line to requisition nodes:

#PBS -l nodes=4:ppn=32

The application now scales well across nodes.

0 Kudos
Reply