Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

need to type "Enter" ?

dingjun_chencmgl_ca
308 Views

Hi, Everyone,

I am running my hybrid MPI/OpenMP jobs on 3-nodes Infiniband PCs Linux cluster. each node has one MPI process that has 15 OpenMP threads. This means my job runs with 3 MPI processes and each MPI process has 15 threads.

the hosts.txt file is given as follows:

coflowrhc4-5:1
coflowrhc4-6:1
coflowrhc4-7:1

 I wrote the following batch file as follows:

/************** batch file******************/

export CMG_LIC_HOST=rlmserv
export exe=/cmg/dingjun/imexLocal/imex_xsamg_dave.exe
export LD_LIBRARY_PATH=/cmg/dingjun/imexLocal/linux_x64/lib
export OMP_SCHEDULE=static,1
export KMP_AFFINITY=compact,0

export datadir=/cmg/dingjun/imexdatasets/7testproblems/mx1041_rb
cd /cmg/dingjun/imexdatasets/7testproblems/mx1041_rb
mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/mx1041x105x10loa2_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o mx1041x105x10loa2_rb_xsamg_3MPI15threads_run7

export datadir=/cmg/dingjun/imexdatasets/7testproblems/mx521_rb
cd /cmg/dingjun/imexdatasets/7testproblems/mx521_rb
mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/mx521x469x20_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o mx5211x469x20_rb_xsamg_3MPI15threads_run1


export datadir=/cmg/dingjun/imexdatasets/7testproblems/spe10_rb
cd /cmg/dingjun/imexdatasets/7testproblems/spe10_rb
mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/spe10_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o spe10_rb_xsamg_3MPI15threads_run1

/************** end of batch file******************/

the Intel MPI version installed is lmpi5.0.3.048 and the problem occurred as follows:

Each time when MPIRUN finishes, I need to type the key "Enter" and then next MPIRUN began to run. Therefore, it is not very convenient for me to run jobs in the batch way. For example:

mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/mx1041x105x10loa2_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o mx1041x105x10loa2_rb_xsamg_3MPI15threads_run7

when above job running on 3-nodes finishes, I need to enter the key "Enter" on the keyboard, then the next job:

mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/mx521x469x20_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o mx5211x469x20_rb_xsamg_3MPI15threads_run1

begins to run. Otherwise, the PCs cluster is stuck and the above 2nd job would never begin to run.

Could you tell me what caused above problem? Thanks in advance.

I am looking forward to hearing from you.

  

 

 

 

 

 

0 Kudos
1 Reply
Gergana_S_Intel
Employee
308 Views

Hi,

Thanks for getting in touch.

The Intel MPI Library doesn't have a requirement to press "Enter" or any other key in order to return.  Could this be coming from your executable?

For example, what happens when you just run the executable outside of the mpirun script?  It'll be something like this:

export datadir=/cmg/dingjun/imexdatasets/7testproblems/mx521_rb
cd /cmg/dingjun/imexdatasets/7testproblems/mx521_rb
/cmg/dingjun/imexLocal/imex_xsamg_dave.exe -fgmres -f ${datadir}/mx521x469x20_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o mx5211x469x20_rb_xsamg_3MPI15threads_run1

Does that require you to press "Enter" before completing execution?

How about trying a different MPI program?  We provide simple Hello World examples in the <impi_install_dir>/test directory.  Can you compile and run those instead of your imex_xsamg_dave.exe file?  Does that work as expected (meaning returns right away)?

Let me know,
~Gergana

0 Kudos
Reply