#PBS -l select=1:ncpus=4:mem=1200mb
This way, each processor will calculate 3 elements of the gradient.
Now, I want to use 12 processors to calculate the 12 elements gradient vector in one go, so I tried to request 3 nodes by using:
#PBS -l select=3:ncpus=4:mem=1200mb
>>> rank 7 in job 1 cx1-5-3-2.cx1.hpc.ic.ac.uk_49216 caused collective abort of all ranks exit status of rank 7: return code 29
I am new to MPI. Is there anything I should be aware of when requesting multinodes? Many thanks for reading my thread.
Welcome to the Intel HPC forums!
It seems like your PBS script is fine but the "caused collective abort of all ranks" error is fairly generic. It mostly means your application failed. It would be great if you could provide your full PBS script, with your mpirun/mpiexec command line, etc. Also, any info on your cluster (e.g. OS version, using InfiniBand or Ethernet, MPI library version, math library version - MKL or something else, etc) would be helpful.
Looking forward to hearing back.
This seems to have solved my problem.