- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
_________________
mpirun -n 4 python addNumbers.py &
sleep 10
mpirun -n 8 python addNumbers.py
_________________
Without finishing the first instance of mpirun, I initialised another mpirun. I am confused weather the second mpirun makes use of new compute nodes or the old nodes?
How to get the names of involved nodes? (For e.g. s001-n036, s001-n035, etc.). I want the total 12 nodes (4+8) to be different.
Also please note that I used "&" intentionally at the end of first mpirun.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for reaching out to us.
If you have 2 commands in a single job file, second one will run on the same nodes as of the first one only after its completion.
Please note that '-n' specified in mpirun command is the number of processes not the number of nodes. Also make sure that your code should be written in such a way to utilize mpirun. Or the code will run as many times as specified without any use.
To get the node names, you can write the below line in the job file.
cat $PBS_NODEFILE
Hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please confirm whether the details provided was helpful?
Please be informed that the thread will get closed within 2 business days assuming that the solution provided was helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm busy doing some important thing please don't close this thread. I will inform you shortly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could please tell us a time frame for checking on this?
If it takes more than a week, we suggest you open a new thread.
Please confirm.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need two days. Please don't close this thread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure. Will wait.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
--------------------------------------distrJob1--------------------------------------
#PBS -l nodes=2:ppn=2
cd $PBS_O_WORKDIR
mpirun -n 4 python addNumbers.py
cat $PBS_NODEFILE
--------------------------------------distrJob1--------------------------------------
--------------------------------------distrJob2--------------------------------------
#PBS -l nodes=4:ppn=2
cd $PBS_O_WORKDIR
mpirun -n 8 python addNumbers.py
cat $PBS_NODEFILE
--------------------------------------distrJob2--------------------------------------
--------------------------------------scriptForMPI--------------------------------------
n_0 = 4
n_1=(12-$n_0)
t_1 = 10
qsub distrJob1 &
sleep $t_1
qsub distrJob2
--------------------------------------scriptForMPI--------------------------------------
I started with the script named scriptForMPI and submitting two jobs distrJob1 and distrJob2. My objective is to start n_0=4 processes at once (at zeroth second) using MPI and wait for t_1=10 seconds before starting distrJob2 on another n_1=8 processes. Note that I used "&" at the end of qsub distrJob1 in scriptForMPI to make sure that I don't want to wait for first MPI instance to finish.
scriptForMPI created two jobs with id's distrJob1.356789, distrJob2.356790. The corresponding output and error files are as follows.
------------------------
distrJob1.e356789
[2] DAPL startup: RLIMIT_MEMLOCK too small
[3] DAPL startup: RLIMIT_MEMLOCK too small
------------------------
distrJob1.o356789
########################################################################
# Date: Sun Sep 29 04:27:08 PDT 2019
# Job ID: 356789.v-qsvr-1.aidevcloud
# User: uXXXX
# Resources: neednodes=2:ppn=2,nodes=2:ppn=2,walltime=06:00:00
########################################################################
s001-n093
s001-n093
s001-n097
s001-n097
########################################################################
# End of output for job 356789.v-qsvr-1.aidevcloud
# Date: Sun Sep 29 04:27:15 PDT 2019
########################################################################
------------------------
distrJob2.e356790
------------------------
distrJob2.o356790
########################################################################
# Date: Sun Sep 29 04:27:20 PDT 2019
# Job ID: 356790.v-qsvr-1.aidevcloud
# User: uXXXX
# Resources: neednodes=4:ppn=2,nodes=4:ppn=2,walltime=06:00:00
########################################################################
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 9074 RUNNING AT s001-n006
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 9075 RUNNING AT s001-n006
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 6 PID 7539 RUNNING AT s001-n008
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 7 PID 7540 RUNNING AT s001-n008
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 4 PID 5617 RUNNING AT s001-n051
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 5 PID 5618 RUNNING AT s001-n051
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 25166 RUNNING AT s001-n007
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 3 PID 25167 RUNNING AT s001-n007
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
s001-n006
s001-n006
s001-n007
s001-n007
s001-n051
s001-n051
s001-n008
s001-n008
########################################################################
# End of output for job 356790.v-qsvr-1.aidevcloud
# Date: Sun Sep 29 04:27:33 PDT 2019
########################################################################
------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you share the workload so that we could try out it from our end (if it is not confidential).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
import time
start_time = time.time()
#from mpi4py import MPI
#rank = MPI.COMM_WORLD.Get_rank()
#size = MPI.COMM_WORLD.Get_size()
#name = MPI.Get_processor_name()
#print ("Hello, World! " "I am process {} of {} on {}".format(rank, size, name))
for round in range(1,1,1):
for i in range(1,int(1e4),1):
num1 = 15
num2 = 12
sum = num1 + num2
end_time = time.time()
import sys
sys.stdout = open('TimingsOfMPI.txt','a')
print("%s" % (end_time - start_time))
sys.stdout.close()
-----------------------------------------------------
I am using this simple code for My research. I don't have any real workload.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We had tried the code from our end and it is working fine (No bad termination error).
Could you please try it once again. How did you submit your script file?
I submitted the code as a script file with the below content :
t_1 = 10
qsub distrJob1 &
sleep $t_1
qsub distrJob2
and submitted the script as 'sh scriptForMPI.sh'.
Please note that MPIRUN will not have significance if your code is not written in manner to utilize its capability.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have submitted the code as 'sh scriptForMPI.sh' but still getting the same error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We will continue this discussion over email

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page