[ramos@node0001 ~]$ I_MPI_DEBUG=10 mpirun -n 2 IMB-MPI1 [0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32) [0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved. [0] MPI startup(): library kind: release [0] MPI startup(): libfabric version: 1.13.2rc1-impi [0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1) [0] MPI startup(): libfabric provider: mlx [0] MPI startup(): File "" not found [0] MPI startup(): Load tuning file: "/software8/depot/intel/oneAPI/mpi/2021.6.0/etc/tuning_generic_shm-ofi.dat" [0] MPI startup(): threading: mode: direct [0] MPI startup(): threading: vcis: 1 [0] MPI startup(): threading: app_threads: -1 [0] MPI startup(): threading: runtime: generic [0] MPI startup(): threading: progress_threads: 0 [0] MPI startup(): threading: async_progress: 0 [0] MPI startup(): threading: lock_level: global [0] MPI startup(): tag bits available: 30 (TAG_UB value: 1073741823) [0] MPI startup(): source bits available: 2 (Maximal number of rank: 3) [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 1580495 node0001 {0} [0] MPI startup(): 1 1580496 node0001 {0} [0] MPI startup(): I_MPI_ROOT=/software8/depot/intel/oneAPI/mpi/2021.6.0 [0] MPI startup(): I_MPI_MPIRUN=mpirun [0] MPI startup(): I_MPI_HYDRA_RMK=pbs [0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc [0] MPI startup(): I_MPI_HYDRA_IFACE=ib0 [0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=pbs [0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default [0] MPI startup(): I_MPI_DEBUG=10 #---------------------------------------------------------------- # Intel(R) MPI Benchmarks 2021.4, MPI-1 part #---------------------------------------------------------------- # Date : Fri Jun 24 09:05:30 2022 # Machine : x86_64 # System : Linux # Release : 4.18.0-348.23.1.el8_5.x86_64 # Version : #1 SMP Tue Apr 12 11:20:32 EDT 2022 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # IMB-MPI1 # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # PingPong # PingPing # Sendrecv # Exchange # Allreduce # Reduce # Reduce_local # Reduce_scatter # Reduce_scatter_block # Allgather # Allgatherv # Gather # Gatherv # Scatter # Scatterv # Alltoall # Alltoallv # Bcast # Barrier #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 384 13016.95 0.00 1 384 13000.02 0.00 2 384 12983.09 0.00 4 380 13034.23 0.00 8 380 13017.12 0.00 16 380 12998.70 0.00 32 372 12982.54 0.00 64 372 12979.86 0.00 128 372 13017.49 0.01 256 372 12982.54 0.02 512 372 12982.55 0.04 1024 372 13000.02 0.08 2048 372 13012.12 0.16 4096 372 13069.91 0.31 8192 372 12982.55 0.63 16384 372 13052.44 1.26 32768 372 13017.50 2.52 65536 372 13000.03 5.04 131072 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #--------------------------------------------------- # Benchmarking PingPing # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 770 13000.02 0.00 1 753 13017.29 0.00 2 753 13000.02 0.00 4 753 13001.19 0.00 8 753 13055.80 0.00 16 753 13063.77 0.00 32 753 13069.08 0.00 64 753 12993.38 0.00 128 753 13086.34 0.01 256 753 12982.75 0.02 512 753 13023.92 0.04 1024 753 13017.28 0.08 2048 753 13017.28 0.16 4096 751 13207.74 0.31 8192 750 13029.36 0.63 16384 385 25979.27 0.63 32768 385 26000.04 1.26 65536 380 26142.15 2.51 131072 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 2 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 770 12983.12 12983.13 12983.12 0.00 1 753 12982.73 13017.28 13000.01 0.00 2 753 13017.26 13051.81 13034.53 0.00 4 753 13000.00 13034.54 13017.27 0.00 8 731 13071.13 13106.72 13088.93 0.00 16 731 13142.27 13177.86 13160.06 0.00 32 731 13071.13 13106.72 13088.93 0.00 64 731 13017.78 13053.37 13035.57 0.01 128 731 13000.00 13035.58 13017.79 0.02 256 731 13000.01 13035.59 13017.80 0.04 512 731 13053.35 13088.94 13071.15 0.08 1024 731 12982.22 13017.80 13000.01 0.16 2048 731 12982.22 13017.80 13000.01 0.31 4096 731 13053.35 13101.25 13077.30 0.63 8192 731 13019.15 13054.73 13036.94 1.26 16384 385 26132.45 26132.50 26132.47 1.25 32768 385 25966.22 25966.26 25966.24 2.52 65536 385 26070.11 26070.15 26070.13 5.03 131072 320 25975.00 25975.06 25975.03 10.09 262144 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #----------------------------------------------------------------------------- # Benchmarking Exchange # #processes = 2 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 751 12982.69 13017.33 13000.01 0.00 1 751 12965.36 13000.02 12982.69 0.00 2 751 13011.98 13046.62 13029.30 0.00 4 751 12965.38 13000.02 12982.70 0.00 8 751 13000.00 13034.64 13017.32 0.00 16 751 13034.62 13069.26 13051.94 0.00 32 751 12981.36 13017.33 12999.35 0.01 64 751 12982.69 13017.33 13000.01 0.02 128 751 12965.38 13000.02 12982.70 0.04 256 751 12982.69 13017.33 13000.01 0.08 512 751 12948.07 12982.71 12965.39 0.16 1024 751 12948.07 12982.71 12965.39 0.32 2048 751 12913.45 12948.09 12930.77 0.63 4096 751 13022.64 13057.28 13039.96 1.25 8192 751 13086.55 13121.19 13103.87 2.50 16384 385 26101.30 26101.35 26101.32 2.51 32768 385 26000.00 26000.04 26000.02 5.04 65536 382 26000.00 26000.05 26000.02 10.08 131072 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.07 0.06 4 382 25949.41 26021.44 25985.43 8 382 25805.45 26065.33 25935.39 16 381 25838.41 26001.04 25919.72 32 380 25895.99 25992.18 25944.08 64 375 0.99 25999.67 13000.33 128 375 1.15 25965.01 12983.08 256 375 1.07 26103.67 13052.37 512 375 1.10 26138.33 13069.71 1024 375 1.19 26068.96 13035.07 2048 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #---------------------------------------------------------------- # Benchmarking Reduce # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.07 0.07 4 387 25934.07 25986.83 25960.45 8 382 25972.84 26139.99 26056.42 16 382 0.41 25931.59 12966.00 32 381 25947.66 26073.83 26010.75 64 381 0.42 26136.14 13068.28 128 381 0.54 26117.79 13059.17 256 381 0.52 25999.66 13000.09 512 381 25992.72 26039.29 26016.00 1024 381 25802.45 25848.08 25825.27 2048 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #---------------------------------------------------------------- # Benchmarking Reduce_local # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.05 0.05 0.05 4 1000 0.07 0.07 0.07 8 1000 0.06 0.07 0.07 16 1000 0.06 0.07 0.07 32 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #---------------------------------------------------------------- # Benchmarking Reduce_scatter # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.23 0.26 0.24 4 385 1.05 25999.68 13000.37 8 385 1.01 26088.00 13044.51 16 385 1.01 25999.69 13000.35 32 385 1.04 26168.52 13084.78 64 385 1.08 26051.64 13026.36 128 380 1.17 26307.58 13154.38 256 380 1.22 26065.49 13033.36 512 380 1.31 26170.71 13086.01 1024 380 1.60 26068.10 13034.85 2048 380 2.01 25999.68 13000.85 4096 380 2.22 26033.88 13018.05 8192 380 2.88 26033.86 13018.37 16384 380 25999.91 26073.10 26036.51 32768 373 26125.20 26210.54 26167.87 65536 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #---------------------------------------------------------------- # Benchmarking Reduce_scatter_block # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.12 0.15 0.14 4 381 0.95 26033.81 13017.38 8 381 0.98 26067.92 13034.45 16 381 0.92 26020.69 13010.80 32 381 0.94 26067.91 13034.42 64 381 0.98 26170.28 13085.63 128 381 1.14 26067.93 13034.53 256 381 1.14 26067.93 13034.53 512 381 1.24 25863.22 12932.23 1024 381 1.49 26238.59 13120.04 2048 380 1.81 25991.86 12996.84 4096 380 2.03 25999.75 13000.89 8192 380 2.75 25991.78 12997.26 16384 380 26103.85 26305.66 26204.75 32768 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #---------------------------------------------------------------- # Benchmarking Allgather # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.15 0.18 0.16 1 385 0.77 26033.49 13017.13 2 380 0.74 25999.74 13000.24 4 380 0.74 25999.73 13000.23 8 380 0.75 25897.09 12948.92 16 380 0.72 25931.27 12966.00 32 380 0.73 25794.45 12897.59 64 380 0.74 25983.94 12992.34 128 380 0.95 25965.48 12983.21 256 380 1.03 26273.41 13137.22 512 380 1.03 26033.93 13017.48 1024 380 1.17 26102.34 13051.75 2048 380 1.35 26033.91 13017.63 4096 380 1.56 25999.71 13000.63 8192 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.17 0.18 0.17 1 381 0.82 25970.85 12985.83 2 381 0.89 25965.60 12983.25 4 380 0.93 25999.72 13000.33 8 380 0.82 26094.47 13047.65 16 380 0.84 26136.57 13068.71 32 380 0.87 26033.93 13017.40 64 380 0.85 26033.94 13017.40 128 380 1.05 26168.14 13084.60 256 380 1.07 26068.17 13034.62 512 371 1.12 26193.46 13097.29 1024 371 1.42 26069.38 13035.40 2048 371 1.76 26013.26 13007.51 4096 371 1.84 26139.89 13070.87 8192 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #---------------------------------------------------------------- # Benchmarking Gather # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.13 0.14 0.13 1 385 0.38 25999.74 13000.06 2 385 0.41 25999.73 13000.07 4 385 0.40 25999.77 13000.08 8 385 0.40 25932.26 12966.33 16 385 0.39 25966.03 12983.21 32 385 0.39 26166.03 13083.21 64 384 0.39 25965.91 12983.15 128 384 0.53 26338.31 13169.42 256 384 0.55 26083.10 13041.82 512 384 0.44 26101.33 13050.88 1024 384 0.53 25999.78 13000.15 2048 381 0.91 25965.64 12983.28 4096 381 1.10 25931.53 12966.31 8192 time-out.; Time limit (secs_per_sample * msg_sizes_list_len) is over; use "-time X" or SECS_PER_SAMPLE=X (IMB_settings.h) to increase time limit. #---------------------------------------------------------------- # Benchmarking Gatherv # #processes = 2 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.17 0.27 0.22