I am performing Intel MPI's IMB Benchmark to test the connectivity & bandwidth of my cluster server (CentOS 6.4, Mellanox Infiniband)
CMD: $ mpiexec.hydra -genv I_MPI_DEBUG 5 -genv I_MPI_FABIRCS dapl -machinefile machines -ppn 1 -n (# of Procs) IMB-RMA
When (# of Procs) is from 2~15, it shows the best performance almost up-to hardware's limitation.
However when I tested with (# of Procs) set as 16, the performance drops to x3000 slower !!!
Even though I change the topology of the MPI processed, the problem remains.
Does anybody know the related issue here?....