[0] MPI startup(): Intel(R) MPI Library, Version 2021.8 Build 20221129 (id: 339ec755a1) [0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved. [0] MPI startup(): library kind: release [0] MPI startup(): libfabric version: 1.13.2rc1-impi [0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1) [0] MPI startup(): libfabric provider: mlx [0] MPI startup(): File "/projects/site/gred/smpg/software/oneAPI/2023/mpi/2021.8.0/etc/tuning_icx_shm-ofi_mlx_10.dat" not found [0] MPI startup(): Load tuning file: "/projects/site/gred/smpg/software/oneAPI/2023/mpi/2021.8.0/etc/tuning_icx_shm-ofi_mlx.dat" [0] MPI startup(): threading: mode: direct [0] MPI startup(): threading: vcis: 1 [0] MPI startup(): threading: app_threads: -1 [0] MPI startup(): threading: runtime: generic [0] MPI startup(): threading: progress_threads: 0 [0] MPI startup(): threading: async_progress: 0 [0] MPI startup(): threading: lock_level: global [0] MPI startup(): tag bits available: 20 (TAG_UB value: 1048575) [0] MPI startup(): source bits available: 21 (Maximal number of rank: 2097151) [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): 0 2021707 sc1nc068is12 {32} [0] MPI startup(): 1 2021708 sc1nc068is12 {33} [0] MPI startup(): 2 2021709 sc1nc068is12 {34} [0] MPI startup(): 3 2021710 sc1nc068is12 {35} [0] MPI startup(): 4 2021711 sc1nc068is12 {36} [0] MPI startup(): 5 2021712 sc1nc068is12 {37} [0] MPI startup(): 6 2021713 sc1nc068is12 {38} [0] MPI startup(): 7 2021714 sc1nc068is12 {39} [0] MPI startup(): 8 2021715 sc1nc068is12 {40} [0] MPI startup(): 9 2021716 sc1nc068is12 {41} [0] MPI startup(): 10 2021717 sc1nc068is12 {42} [0] MPI startup(): 11 2021718 sc1nc068is12 {43} [0] MPI startup(): 12 2021719 sc1nc068is12 {44} [0] MPI startup(): 13 2021720 sc1nc068is12 {45} [0] MPI startup(): 14 2021721 sc1nc068is12 {46} [0] MPI startup(): 15 2021722 sc1nc068is12 {47} [0] MPI startup(): 16 2021723 sc1nc068is12 {48} [0] MPI startup(): 17 2021724 sc1nc068is12 {49} [0] MPI startup(): 18 2021725 sc1nc068is12 {50} [0] MPI startup(): 19 2021726 sc1nc068is12 {51} [0] MPI startup(): 20 2021727 sc1nc068is12 {52} [0] MPI startup(): 21 2021728 sc1nc068is12 {53} [0] MPI startup(): 22 2021729 sc1nc068is12 {54} [0] MPI startup(): 23 2021730 sc1nc068is12 {55} [0] MPI startup(): 24 2021731 sc1nc068is12 {56} [0] MPI startup(): 25 2021732 sc1nc068is12 {57} [0] MPI startup(): 26 2021733 sc1nc068is12 {58} [0] MPI startup(): 27 2021734 sc1nc068is12 {59} [0] MPI startup(): 28 2021735 sc1nc068is12 {60} [0] MPI startup(): 29 2021736 sc1nc068is12 {61} [0] MPI startup(): 30 2021737 sc1nc068is12 {62} [0] MPI startup(): 31 2021738 sc1nc068is12 {63} [0] MPI startup(): I_MPI_ROOT=/projects/site/gred/smpg/software/oneAPI/2023/mpi/2021.8.0 [0] MPI startup(): I_MPI_MPIRUN=mpirun [0] MPI startup(): I_MPI_HYDRA_RMK=lsf [0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc [0] MPI startup(): I_MPI_HYDRA_BOOTSTRAP=lsf [0] MPI startup(): I_MPI_PIN_CELL=core [0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default [0] MPI startup(): I_MPI_SHM_HEAP=1 [0] MPI startup(): I_MPI_DEBUG=6 #---------------------------------------------------------------- # Intel(R) MPI Benchmarks 2021.4, MPI-1 part #---------------------------------------------------------------- # Date : Wed Mar 15 09:37:57 2023 # Machine : x86_64 # System : Linux # Release : 4.18.0-372.32.1.el8_6.x86_64 # Version : #1 SMP Fri Oct 7 12:35:10 EDT 2022 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # IMB-MPI1 # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # PingPong # PingPing # Sendrecv # Exchange # Allreduce # Reduce # Reduce_local # Reduce_scatter # Reduce_scatter_block # Allgather # Allgatherv # Gather # Gatherv # Scatter # Scatterv # Alltoall # Alltoallv # Bcast # Barrier #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 0.24 0.00 1 1000 0.24 4.19 2 1000 0.23 8.73 4 1000 0.23 17.46 8 1000 0.23 34.89 16 1000 0.23 69.97 32 1000 0.23 139.42 64 1000 0.23 272.68 128 1000 0.24 531.72 256 1000 0.27 962.42 512 1000 0.37 1400.65 1024 1000 0.45 2296.32 2048 1000 0.54 3802.04 4096 1000 0.78 5238.59 8192 1000 0.52 15793.45 16384 1000 0.56 29222.68 32768 1000 1.22 26772.49 65536 640 1.96 33442.23 131072 320 3.44 38116.51 262144 160 6.41 40918.11 524288 80 14.39 36426.95 1048576 40 64.60 16232.62 2097152 20 114.52 18312.10 4194304 10 231.18 18142.78 #--------------------------------------------------- # Benchmarking PingPing # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #--------------------------------------------------- #bytes #repetitions t[usec] Mbytes/sec 0 1000 0.33 0.00 1 1000 0.33 3.01 2 1000 0.33 6.02 4 1000 0.33 12.20 8 1000 0.33 24.36 16 1000 0.33 48.30 32 1000 0.33 96.90 64 1000 0.34 190.04 128 1000 0.34 371.25 256 1000 0.42 606.48 512 1000 0.48 1073.23 1024 1000 0.55 1874.00 2048 1000 0.72 2825.89 4096 1000 0.91 4519.09 8192 1000 0.75 10857.77 16384 1000 0.71 23239.41 32768 1000 1.45 22545.77 65536 640 2.14 30581.04 131072 320 3.63 36064.47 262144 160 6.92 37899.85 524288 80 14.69 35692.70 1048576 40 70.81 14807.56 2097152 20 122.09 17177.03 4194304 10 245.93 17054.61 #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.30 0.30 0.30 0.00 1 1000 0.30 0.30 0.30 6.62 2 1000 0.34 0.34 0.34 11.66 4 1000 0.30 0.30 0.30 26.41 8 1000 0.30 0.30 0.30 52.87 16 1000 0.30 0.30 0.30 105.28 32 1000 0.30 0.30 0.30 211.18 64 1000 0.31 0.31 0.31 409.82 128 1000 0.34 0.34 0.34 749.12 256 1000 0.34 0.34 0.34 1506.41 512 1000 0.43 0.43 0.43 2362.37 1024 1000 0.49 0.49 0.49 4164.71 2048 1000 0.65 0.65 0.65 6304.85 4096 1000 0.85 0.85 0.85 9602.14 8192 1000 0.68 0.68 0.68 24098.90 16384 1000 0.71 0.71 0.71 46005.00 32768 1000 1.42 1.42 1.42 46151.55 65536 640 2.13 2.13 2.13 61410.07 131072 320 3.62 3.62 3.62 72393.24 262144 160 6.81 6.81 6.81 77015.09 524288 80 16.79 16.79 16.79 62437.43 1048576 40 66.78 66.78 66.78 31405.48 2097152 20 122.29 122.30 122.29 34295.03 4194304 10 246.53 246.57 246.55 34021.48 #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.31 0.31 0.31 0.00 1 1000 0.31 0.31 0.31 6.51 2 1000 0.31 0.31 0.31 13.03 4 1000 0.30 0.30 0.30 26.26 8 1000 0.30 0.30 0.30 52.71 16 1000 0.30 0.30 0.30 105.47 32 1000 0.31 0.31 0.31 208.26 64 1000 0.32 0.32 0.32 405.75 128 1000 0.32 0.32 0.32 794.74 256 1000 0.36 0.36 0.36 1411.42 512 1000 0.47 0.47 0.47 2200.54 1024 1000 0.49 0.49 0.49 4139.68 2048 1000 0.62 0.62 0.62 6646.21 4096 1000 0.88 0.88 0.88 9278.53 8192 1000 0.68 0.68 0.68 23951.39 16384 1000 0.71 0.71 0.71 46376.32 32768 1000 1.41 1.41 1.41 46476.42 65536 640 2.15 2.15 2.15 60896.12 131072 320 3.73 3.73 3.73 70189.81 262144 160 6.63 6.63 6.63 79092.14 524288 80 16.19 16.22 16.21 64642.43 1048576 40 113.82 114.63 114.26 18294.55 2097152 20 159.72 160.57 160.21 26121.58 4194304 10 325.30 328.35 326.99 25547.84 #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.30 0.30 0.30 0.00 1 1000 0.30 0.30 0.30 6.64 2 1000 0.30 0.30 0.30 13.28 4 1000 0.30 0.30 0.30 26.43 8 1000 0.30 0.30 0.30 52.99 16 1000 0.30 0.30 0.30 106.03 32 1000 0.35 0.35 0.35 182.91 64 1000 0.32 0.32 0.32 400.00 128 1000 0.32 0.32 0.32 795.68 256 1000 0.37 0.37 0.37 1399.19 512 1000 0.44 0.44 0.44 2344.21 1024 1000 0.55 0.55 0.55 3693.61 2048 1000 0.63 0.63 0.63 6520.47 4096 1000 0.87 0.88 0.88 9360.23 8192 1000 0.68 0.68 0.68 23941.46 16384 1000 0.71 0.71 0.71 45969.21 32768 1000 1.42 1.42 1.42 46204.00 65536 640 2.21 2.21 2.21 59310.74 131072 320 3.63 3.64 3.63 72114.91 262144 160 6.62 6.62 6.62 79156.16 524288 80 16.87 16.95 16.91 61844.86 1048576 40 95.14 96.90 96.22 21642.22 2097152 20 160.09 163.20 161.92 25700.64 4194304 10 332.52 346.24 340.70 24227.66 #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.31 0.31 0.31 0.00 1 1000 0.30 0.30 0.30 6.65 2 1000 0.30 0.30 0.30 13.26 4 1000 0.30 0.30 0.30 26.52 8 1000 0.30 0.30 0.30 52.85 16 1000 0.30 0.30 0.30 105.88 32 1000 0.30 0.30 0.30 211.76 64 1000 0.37 0.37 0.37 341.82 128 1000 0.32 0.32 0.32 795.87 256 1000 0.37 0.37 0.37 1400.07 512 1000 0.43 0.44 0.44 2349.58 1024 1000 0.50 0.50 0.50 4101.69 2048 1000 0.63 0.63 0.63 6513.87 4096 1000 0.88 0.88 0.88 9326.83 8192 1000 0.74 0.74 0.74 22247.40 16384 1000 0.71 0.71 0.71 46160.45 32768 1000 1.44 1.45 1.44 45351.73 65536 640 2.20 2.20 2.20 59481.49 131072 320 3.63 3.63 3.63 72164.46 262144 160 6.80 6.82 6.81 76893.39 524288 80 17.75 17.99 17.88 58298.04 1048576 40 109.47 114.92 112.80 18248.13 2097152 20 216.19 234.70 227.36 17870.58 4194304 10 572.22 586.77 580.93 14296.18 #----------------------------------------------------------------------------- # Benchmarking Sendrecv # #processes = 32 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.31 0.31 0.31 0.00 1 1000 0.30 0.30 0.30 6.56 2 1000 0.30 0.30 0.30 13.12 4 1000 0.31 0.31 0.31 25.65 8 1000 0.31 0.31 0.31 51.78 16 1000 0.31 0.31 0.31 103.40 32 1000 0.36 0.36 0.36 179.14 64 1000 0.32 0.32 0.32 395.11 128 1000 0.33 0.33 0.33 769.51 256 1000 0.38 0.38 0.38 1358.40 512 1000 0.45 0.45 0.45 2275.91 1024 1000 0.52 0.52 0.52 3928.03 2048 1000 0.65 0.65 0.65 6334.10 4096 1000 0.95 0.95 0.95 8606.57 8192 1000 0.71 0.71 0.71 23059.24 16384 1000 0.77 0.77 0.77 42475.61 32768 1000 1.40 1.40 1.40 46749.06 65536 640 2.15 2.15 2.15 60861.41 131072 320 3.64 3.64 3.64 72003.91 262144 160 7.22 7.25 7.24 72313.29 524288 80 18.87 19.56 19.24 53607.02 1048576 40 147.98 166.23 157.60 12615.64 2097152 20 474.31 482.94 479.15 8684.90 4194304 10 1367.24 1400.17 1382.39 5991.16 #----------------------------------------------------------------------------- # Benchmarking Exchange # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.63 0.63 0.63 0.00 1 1000 0.63 0.63 0.63 6.34 2 1000 0.70 0.70 0.70 11.48 4 1000 0.63 0.63 0.63 25.33 8 1000 0.63 0.63 0.63 50.62 16 1000 0.68 0.68 0.68 94.70 32 1000 0.63 0.63 0.63 202.26 64 1000 0.65 0.65 0.65 396.14 128 1000 0.73 0.73 0.73 703.90 256 1000 0.71 0.71 0.71 1452.09 512 1000 0.96 0.96 0.96 2128.68 1024 1000 1.04 1.04 1.04 3949.07 2048 1000 1.33 1.33 1.33 6166.18 4096 1000 1.77 1.77 1.77 9248.84 8192 1000 1.30 1.30 1.30 25175.38 16384 1000 1.46 1.46 1.46 44968.06 32768 1000 2.79 2.79 2.79 46998.96 65536 640 4.28 4.28 4.28 61288.52 131072 320 7.25 7.25 7.25 72342.57 262144 160 13.30 13.30 13.30 78865.35 524288 80 42.67 42.67 42.67 49146.08 1048576 40 144.07 144.08 144.08 29110.78 2097152 20 246.67 246.68 246.68 34005.87 4194304 10 492.69 492.73 492.71 34049.82 #----------------------------------------------------------------------------- # Benchmarking Exchange # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.75 0.75 0.75 0.00 1 1000 0.76 0.76 0.76 5.29 2 1000 0.76 0.76 0.76 10.53 4 1000 0.76 0.76 0.76 21.06 8 1000 0.80 0.80 0.80 40.02 16 1000 0.76 0.76 0.76 84.37 32 1000 0.76 0.76 0.76 169.32 64 1000 0.78 0.78 0.78 326.56 128 1000 0.81 0.81 0.81 634.62 256 1000 0.91 0.91 0.91 1127.92 512 1000 1.04 1.04 1.04 1969.14 1024 1000 1.18 1.18 1.18 3470.26 2048 1000 1.51 1.51 1.51 5430.29 4096 1000 2.13 2.14 2.13 7673.55 8192 1000 1.39 1.39 1.39 23646.54 16384 1000 1.53 1.53 1.53 42772.02 32768 1000 2.92 2.92 2.92 44947.84 65536 640 4.38 4.38 4.38 59787.85 131072 320 7.36 7.36 7.36 71238.34 262144 160 13.62 13.62 13.62 76974.05 524288 80 43.76 43.76 43.76 47920.31 1048576 40 204.11 205.41 204.84 20419.66 2097152 20 328.80 329.51 329.15 25457.93 4194304 10 661.38 664.12 662.75 25262.20 #----------------------------------------------------------------------------- # Benchmarking Exchange # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.76 0.76 0.76 0.00 1 1000 0.77 0.77 0.77 5.21 2 1000 0.77 0.77 0.77 10.42 4 1000 0.77 0.77 0.77 20.81 8 1000 0.81 0.81 0.81 39.57 16 1000 0.77 0.77 0.77 83.23 32 1000 0.77 0.77 0.77 165.28 64 1000 0.80 0.80 0.80 320.86 128 1000 0.82 0.82 0.82 626.14 256 1000 0.92 0.92 0.92 1118.01 512 1000 1.05 1.05 1.05 1949.29 1024 1000 1.19 1.19 1.19 3435.58 2048 1000 1.56 1.56 1.56 5255.99 4096 1000 2.22 2.22 2.22 7392.10 8192 1000 1.38 1.38 1.38 23693.86 16384 1000 1.53 1.53 1.53 42714.06 32768 1000 2.87 2.87 2.87 45645.90 65536 640 4.39 4.40 4.40 59612.46 131072 320 7.32 7.32 7.32 71635.03 262144 160 13.95 13.97 13.96 75078.56 524288 80 43.98 44.10 44.05 47555.69 1048576 40 199.82 202.39 201.35 20723.76 2097152 20 335.26 339.34 337.39 24720.22 4194304 10 817.84 833.46 826.94 20129.69 #----------------------------------------------------------------------------- # Benchmarking Exchange # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.76 0.76 0.76 0.00 1 1000 0.77 0.77 0.77 5.18 2 1000 0.77 0.78 0.77 10.32 4 1000 0.77 0.77 0.77 20.76 8 1000 0.82 0.82 0.82 39.07 16 1000 0.77 0.78 0.77 82.51 32 1000 0.77 0.77 0.77 165.86 64 1000 0.81 0.81 0.81 317.44 128 1000 0.82 0.82 0.82 622.91 256 1000 0.88 0.88 0.88 1162.17 512 1000 1.05 1.05 1.05 1952.25 1024 1000 1.19 1.20 1.19 3425.41 2048 1000 1.55 1.55 1.55 5290.90 4096 1000 2.22 2.22 2.22 7377.84 8192 1000 1.38 1.38 1.38 23703.20 16384 1000 1.56 1.56 1.56 41982.60 32768 1000 2.93 2.93 2.93 44721.30 65536 640 4.36 4.37 4.37 60000.32 131072 320 7.33 7.34 7.33 71426.44 262144 160 14.99 15.08 15.04 69549.41 524288 80 62.74 63.76 63.35 32889.37 1048576 40 236.40 245.49 241.57 17085.30 2097152 20 574.62 580.73 577.87 14444.97 4194304 10 1519.90 1527.75 1523.91 10981.67 #----------------------------------------------------------------------------- # Benchmarking Exchange # #processes = 32 #----------------------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec 0 1000 0.77 0.77 0.77 0.00 1 1000 0.79 0.79 0.79 5.05 2 1000 0.79 0.79 0.79 10.14 4 1000 0.79 0.79 0.79 20.22 8 1000 0.82 0.82 0.82 38.92 16 1000 0.78 0.78 0.78 82.07 32 1000 0.82 0.82 0.82 156.15 64 1000 0.81 0.82 0.81 313.94 128 1000 0.84 0.84 0.84 611.13 256 1000 0.89 0.89 0.89 1148.44 512 1000 1.13 1.13 1.13 1808.06 1024 1000 1.21 1.21 1.21 3374.46 2048 1000 1.54 1.55 1.54 5294.81 4096 1000 2.18 2.19 2.18 7497.24 8192 1000 1.36 1.37 1.37 23951.93 16384 1000 1.57 1.58 1.58 41513.60 32768 1000 2.88 2.88 2.88 45444.27 65536 640 4.50 4.51 4.51 58074.31 131072 320 7.36 7.38 7.37 71005.84 262144 160 14.96 15.15 15.05 69202.33 524288 80 91.08 94.69 93.21 22147.43 1048576 40 357.02 397.26 383.32 10558.07 2097152 20 1377.67 1386.63 1383.22 6049.66 4194304 10 3113.89 3126.25 3121.11 5366.56 #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 0.42 0.47 0.45 8 1000 0.51 0.52 0.51 16 1000 0.52 0.53 0.53 32 1000 0.49 0.51 0.50 64 1000 0.56 0.56 0.56 128 1000 0.54 0.57 0.55 256 1000 0.57 0.60 0.59 512 1000 0.75 0.82 0.79 1024 1000 0.94 1.01 0.98 2048 1000 0.87 0.88 0.88 4096 1000 1.20 1.20 1.20 8192 1000 2.01 2.04 2.02 16384 1000 3.41 3.45 3.43 32768 1000 4.53 5.69 5.11 65536 640 6.35 6.40 6.38 131072 320 10.81 10.84 10.83 262144 160 19.90 19.90 19.90 524288 80 42.89 42.93 42.91 1048576 40 110.51 110.60 110.56 2097152 20 220.99 221.07 221.03 4194304 10 438.41 438.54 438.47 #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 0.45 0.47 0.46 8 1000 0.44 0.46 0.45 16 1000 0.88 0.96 0.91 32 1000 0.88 0.96 0.90 64 1000 0.91 1.04 0.96 128 1000 0.99 1.07 1.01 256 1000 0.82 1.13 0.93 512 1000 0.92 1.21 1.00 1024 1000 1.27 1.57 1.36 2048 1000 1.74 2.06 1.83 4096 1000 2.19 2.33 2.23 8192 1000 4.75 4.90 4.85 16384 1000 4.87 5.03 4.94 32768 1000 7.63 7.80 7.70 65536 640 11.93 12.12 12.04 131072 320 18.36 18.62 18.47 262144 160 34.33 34.60 34.43 524288 80 76.59 77.59 77.05 1048576 40 179.12 179.86 179.61 2097152 20 383.48 387.63 386.30 4194304 10 912.95 935.79 922.98 #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.05 0.03 4 1000 0.75 0.77 0.76 8 1000 0.73 0.76 0.74 16 1000 0.75 0.80 0.77 32 1000 0.79 0.83 0.81 64 1000 0.73 1.29 0.92 128 1000 0.91 1.39 1.00 256 1000 0.79 1.27 0.93 512 1000 1.22 1.69 1.31 1024 1000 1.58 2.07 1.70 2048 1000 2.79 3.27 2.93 4096 1000 3.21 3.64 3.34 8192 1000 4.54 4.82 4.67 16384 1000 6.23 6.59 6.40 32768 1000 9.52 9.92 9.70 65536 640 15.44 15.99 15.74 131072 320 24.56 24.91 24.72 262144 160 42.68 43.66 43.22 524288 80 92.82 95.46 94.19 1048576 40 314.83 324.25 318.40 2097152 20 611.37 626.79 616.85 4194304 10 1241.45 1277.11 1253.48 #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 0.74 1.13 0.98 8 1000 0.74 1.13 0.98 16 1000 0.72 1.12 0.96 32 1000 0.73 1.12 0.97 64 1000 1.24 2.04 1.40 128 1000 1.31 2.15 1.48 256 1000 1.50 2.41 1.69 512 1000 1.14 1.97 1.44 1024 1000 1.59 2.41 1.88 2048 1000 2.43 3.27 2.73 4096 1000 3.73 4.65 4.06 8192 1000 5.59 6.25 5.90 16384 1000 7.47 8.14 7.82 32768 1000 11.16 11.88 11.50 65536 640 18.08 19.04 18.56 131072 320 30.51 31.08 30.90 262144 160 54.01 55.43 54.65 524288 80 180.81 190.64 185.16 1048576 40 348.05 380.64 362.69 2097152 20 737.13 757.50 748.43 4194304 10 1635.01 1665.15 1650.02 #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 0.98 1.09 1.06 8 1000 0.97 1.09 1.04 16 1000 1.07 1.14 1.10 32 1000 1.02 1.11 1.06 64 1000 1.16 1.99 1.24 128 1000 1.23 2.04 1.30 256 1000 1.34 2.15 1.42 512 1000 1.64 2.48 1.73 1024 1000 2.28 3.18 2.42 2048 1000 2.99 3.98 3.19 4096 1000 4.44 5.42 4.62 8192 1000 6.89 7.22 6.99 16384 1000 9.04 9.46 9.16 32768 1000 13.55 14.04 13.70 65536 640 23.46 23.90 23.71 131072 320 41.84 42.83 42.26 262144 160 78.43 80.19 79.36 524288 80 168.92 175.42 171.07 1048576 40 435.61 460.27 445.81 2097152 20 957.24 980.67 968.67 4194304 10 1948.79 1976.50 1961.08 #---------------------------------------------------------------- # Benchmarking Reduce # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 0.24 0.34 0.29 8 1000 0.22 0.34 0.28 16 1000 0.22 0.34 0.28 32 1000 0.22 0.34 0.28 64 1000 0.23 0.38 0.30 128 1000 0.23 0.37 0.30 256 1000 0.24 0.39 0.31 512 1000 0.60 0.61 0.61 1024 1000 0.70 0.72 0.71 2048 1000 0.22 0.72 0.47 4096 1000 1.06 1.11 1.09 8192 1000 0.53 1.70 1.11 16384 1000 2.06 2.57 2.31 32768 1000 2.86 6.22 4.54 65536 640 5.45 10.24 7.84 131072 320 10.44 17.66 14.05 262144 160 18.58 21.73 20.15 524288 80 88.56 89.11 88.84 1048576 40 138.67 139.23 138.95 2097152 20 282.75 283.44 283.10 4194304 10 554.57 555.11 554.84 #---------------------------------------------------------------- # Benchmarking Reduce # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 0.17 0.40 0.24 8 1000 0.17 0.40 0.24 16 1000 0.17 0.45 0.25 32 1000 0.17 0.41 0.25 64 1000 0.17 0.51 0.27 128 1000 0.17 0.55 0.27 256 1000 0.17 0.57 0.28 512 1000 0.18 0.63 0.30 1024 1000 0.19 0.80 0.34 2048 1000 0.19 0.97 0.40 4096 1000 0.20 1.40 0.56 8192 1000 0.47 2.19 0.95 16384 1000 2.95 4.32 3.43 32768 1000 5.13 6.74 5.66 65536 640 7.86 9.58 8.35 131072 320 13.38 17.12 14.64 262144 160 25.54 32.28 28.00 524288 80 89.12 91.19 90.16 1048576 40 141.76 145.69 144.12 2097152 20 281.57 283.76 282.63 4194304 10 561.11 563.49 562.28 #---------------------------------------------------------------- # Benchmarking Reduce # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 0.16 0.49 0.21 8 1000 0.16 0.49 0.22 16 1000 0.18 0.51 0.23 32 1000 0.18 0.56 0.24 64 1000 0.17 0.65 0.24 128 1000 0.17 0.99 0.32 256 1000 0.17 0.93 0.33 512 1000 0.17 1.03 0.35 1024 1000 0.18 1.26 0.41 2048 1000 0.19 1.57 0.47 4096 1000 0.20 2.21 0.64 8192 1000 0.43 3.41 1.14 16384 1000 3.10 5.65 4.17 32768 1000 5.56 8.76 6.93 65536 640 10.51 13.09 11.10 131072 320 16.69 20.99 17.69 262144 160 30.77 37.98 32.36 524288 80 89.83 95.71 92.46 1048576 40 141.97 146.95 144.40 2097152 20 284.31 289.47 286.85 4194304 10 566.41 571.62 569.02 #---------------------------------------------------------------- # Benchmarking Reduce # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 0.16 0.64 0.22 8 1000 0.16 0.61 0.23 16 1000 0.17 0.65 0.24 32 1000 0.16 0.60 0.23 64 1000 0.16 0.73 0.26 128 1000 0.16 0.89 0.26 256 1000 0.16 0.94 0.28 512 1000 0.17 0.99 0.30 1024 1000 0.18 1.21 0.35 2048 1000 0.18 1.61 0.44 4096 1000 0.19 2.29 0.58 8192 1000 0.45 4.09 1.16 16384 1000 3.09 6.80 4.49 32768 1000 5.60 10.57 7.50 65536 640 11.23 15.12 12.44 131072 320 17.59 23.68 19.53 262144 160 32.10 42.59 35.15 524288 80 93.60 104.49 98.90 1048576 40 151.37 161.74 156.50 2097152 20 306.49 317.11 311.90 4194304 10 594.46 605.90 600.09 #---------------------------------------------------------------- # Benchmarking Reduce # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 0.16 0.82 0.24 8 1000 0.17 0.86 0.24 16 1000 0.16 0.84 0.25 32 1000 0.16 0.85 0.25 64 1000 0.16 1.06 0.28 128 1000 0.16 1.16 0.28 256 1000 0.17 1.33 0.32 512 1000 0.17 1.43 0.34 1024 1000 0.18 1.71 0.39 2048 1000 0.18 2.17 0.48 4096 1000 0.19 3.22 0.66 8192 1000 0.52 5.28 1.36 16384 1000 3.41 8.60 5.15 32768 1000 6.30 13.12 8.57 65536 640 13.00 19.26 15.24 131072 320 21.88 36.15 29.09 262144 160 40.92 57.95 47.96 524288 80 96.14 116.17 106.10 1048576 40 152.72 173.56 163.09 2097152 20 335.49 417.77 389.17 4194304 10 640.54 785.69 764.82 #---------------------------------------------------------------- # Benchmarking Reduce_local # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 4 1000 0.03 0.03 0.03 8 1000 0.03 0.03 0.03 16 1000 0.04 0.04 0.04 32 1000 0.03 0.03 0.03 64 1000 0.03 0.03 0.03 128 1000 0.04 0.04 0.04 256 1000 0.05 0.05 0.05 512 1000 0.06 0.06 0.06 1024 1000 0.08 0.09 0.08 2048 1000 0.13 0.13 0.13 4096 1000 0.23 0.28 0.25 8192 1000 0.43 0.43 0.43 16384 1000 0.82 0.86 0.84 32768 1000 1.60 1.64 1.62 65536 640 3.16 3.17 3.16 131072 320 6.29 6.29 6.29 262144 160 12.53 12.55 12.54 524288 80 25.18 25.98 25.58 1048576 40 71.96 83.19 77.57 2097152 20 139.40 147.37 143.38 4194304 10 279.14 295.39 287.27 #---------------------------------------------------------------- # Benchmarking Reduce_local # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.03 0.02 4 1000 0.03 0.03 0.03 8 1000 0.03 0.03 0.03 16 1000 0.03 0.03 0.03 32 1000 0.03 0.03 0.03 64 1000 0.03 0.03 0.03 128 1000 0.04 0.04 0.04 256 1000 0.04 0.05 0.05 512 1000 0.05 0.06 0.05 1024 1000 0.07 0.08 0.08 2048 1000 0.09 0.13 0.12 4096 1000 0.13 0.23 0.20 8192 1000 0.22 0.43 0.37 16384 1000 0.39 0.81 0.71 32768 1000 1.23 1.61 1.51 65536 640 2.53 3.17 3.01 131072 320 5.07 6.29 5.98 262144 160 9.99 12.53 11.89 524288 80 21.64 25.23 24.27 1048576 40 85.92 98.13 89.55 2097152 20 173.78 207.70 186.30 4194304 10 350.77 411.78 374.07 #---------------------------------------------------------------- # Benchmarking Reduce_local # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.03 0.02 4 1000 0.03 0.03 0.03 8 1000 0.03 0.06 0.05 16 1000 0.03 0.03 0.03 32 1000 0.03 0.06 0.05 64 1000 0.03 0.06 0.05 128 1000 0.03 0.06 0.05 256 1000 0.04 0.07 0.05 512 1000 0.04 0.06 0.05 1024 1000 0.07 0.13 0.09 2048 1000 0.09 0.20 0.14 4096 1000 0.13 0.32 0.22 8192 1000 0.22 0.43 0.38 16384 1000 0.39 0.82 0.72 32768 1000 1.23 1.61 1.52 65536 640 2.54 3.17 3.01 131072 320 5.06 6.30 5.99 262144 160 9.98 12.53 11.89 524288 80 20.76 26.67 24.49 1048576 40 71.62 100.61 82.22 2097152 20 150.15 212.33 171.60 4194304 10 506.37 545.67 524.16 #---------------------------------------------------------------- # Benchmarking Reduce_local # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.03 0.02 4 1000 0.03 0.03 0.03 8 1000 0.03 0.03 0.03 16 1000 0.03 0.05 0.04 32 1000 0.03 0.03 0.03 64 1000 0.03 0.05 0.04 128 1000 0.03 0.05 0.04 256 1000 0.04 0.06 0.05 512 1000 0.04 0.07 0.06 1024 1000 0.05 0.57 0.11 2048 1000 0.07 0.56 0.15 4096 1000 0.11 0.57 0.23 8192 1000 0.30 0.71 0.42 16384 1000 0.36 0.87 0.73 32768 1000 1.20 1.75 1.54 65536 640 2.52 3.17 3.04 131072 320 5.00 6.29 6.00 262144 160 9.97 12.53 11.92 524288 80 20.07 26.51 24.75 1048576 40 91.61 153.82 114.00 2097152 20 239.58 311.77 258.70 4194304 10 884.50 987.19 968.85 #---------------------------------------------------------------- # Benchmarking Reduce_local # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.03 0.02 4 1000 0.03 0.03 0.03 8 1000 0.03 0.03 0.03 16 1000 0.03 0.04 0.04 32 1000 0.03 0.04 0.04 64 1000 0.03 0.14 0.04 128 1000 0.03 0.04 0.04 256 1000 0.04 0.05 0.04 512 1000 0.04 0.08 0.06 1024 1000 0.07 0.15 0.09 2048 1000 0.13 0.21 0.14 4096 1000 0.13 0.29 0.21 8192 1000 0.22 0.43 0.38 16384 1000 0.47 0.89 0.81 32768 1000 1.23 1.61 1.52 65536 640 2.53 3.17 3.02 131072 320 5.01 6.29 5.98 262144 160 10.12 12.84 12.10 524288 80 23.67 30.88 28.55 1048576 40 165.70 199.44 179.32 2097152 20 696.91 786.31 775.64 4194304 10 1980.42 2214.86 2188.11 #---------------------------------------------------------------- # Benchmarking Reduce_scatter # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.13 0.14 0.14 4 1000 0.58 0.62 0.60 8 1000 0.63 0.64 0.64 16 1000 0.60 0.61 0.60 32 1000 0.60 0.61 0.60 64 1000 0.61 0.64 0.63 128 1000 0.64 0.67 0.66 256 1000 0.71 0.72 0.72 512 1000 0.80 0.81 0.80 1024 1000 0.83 0.84 0.84 2048 1000 1.02 1.02 1.02 4096 1000 1.35 1.39 1.37 8192 1000 2.44 2.49 2.47 16384 1000 4.36 4.45 4.41 32768 1000 6.84 6.84 6.84 65536 640 21.94 22.04 21.99 131072 320 11.39 11.40 11.39 262144 160 24.87 27.47 26.17 524288 80 83.55 85.19 84.37 1048576 40 212.58 216.54 214.56 2097152 20 511.57 514.86 513.22 4194304 10 1017.20 1024.44 1020.82 #---------------------------------------------------------------- # Benchmarking Reduce_scatter # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.08 0.19 0.11 4 1000 1.03 1.19 1.09 8 1000 1.03 1.15 1.07 16 1000 1.02 1.14 1.07 32 1000 1.09 1.20 1.13 64 1000 1.07 1.26 1.13 128 1000 1.13 1.31 1.20 256 1000 1.29 1.51 1.36 512 1000 1.37 1.68 1.50 1024 1000 1.60 1.89 1.73 2048 1000 2.12 2.37 2.23 4096 1000 3.65 3.90 3.75 8192 1000 6.34 6.69 6.50 16384 1000 10.14 10.49 10.29 32768 1000 16.98 17.25 17.10 65536 640 15.32 15.95 15.64 131072 320 28.50 29.13 28.78 262144 160 77.26 80.63 78.79 524288 80 216.39 220.78 219.25 1048576 40 692.70 737.45 716.32 2097152 20 1437.65 1468.87 1453.06 4194304 10 3250.00 3296.23 3263.93 #---------------------------------------------------------------- # Benchmarking Reduce_scatter # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.10 0.19 0.12 4 1000 1.48 1.66 1.53 8 1000 1.38 1.61 1.46 16 1000 1.50 1.70 1.56 32 1000 1.54 1.75 1.61 64 1000 1.55 1.73 1.61 128 1000 1.74 1.97 1.82 256 1000 1.93 2.25 2.05 512 1000 2.23 2.55 2.35 1024 1000 2.84 3.16 2.96 2048 1000 4.43 4.75 4.54 4096 1000 7.45 7.90 7.62 8192 1000 12.02 12.38 12.17 16384 1000 20.11 20.52 20.26 32768 1000 38.04 38.54 38.22 65536 640 36.07 36.95 36.39 131072 320 76.35 78.81 77.37 262144 160 181.53 184.57 183.24 524288 80 416.62 420.90 418.74 1048576 40 1459.43 1524.45 1487.57 2097152 20 3984.79 4020.86 4005.94 4194304 10 8472.96 8900.45 8660.80 #---------------------------------------------------------------- # Benchmarking Reduce_scatter # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.12 0.16 0.12 4 1000 1.82 2.05 1.88 8 1000 1.89 2.08 1.94 16 1000 1.97 2.13 2.01 32 1000 1.98 2.15 2.02 64 1000 2.12 2.35 2.17 128 1000 2.38 2.63 2.45 256 1000 2.84 3.13 2.93 512 1000 3.63 3.78 3.67 1024 1000 5.21 5.45 5.29 2048 1000 8.35 8.69 8.48 4096 1000 23.41 29.70 26.89 8192 1000 42.32 53.70 49.78 16384 1000 73.47 92.06 86.55 32768 1000 172.75 192.13 183.15 65536 640 83.57 84.43 83.90 131072 320 224.12 230.70 227.57 262144 160 461.34 466.37 464.01 524288 80 1293.75 1298.46 1296.23 1048576 40 3972.42 4093.73 4035.22 2097152 20 11863.18 12015.55 11944.24 4194304 10 28525.79 28906.36 28720.00 #---------------------------------------------------------------- # Benchmarking Reduce_scatter # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.12 0.20 0.13 4 1000 2.35 2.67 2.44 8 1000 2.42 2.75 2.51 16 1000 2.46 2.80 2.56 32 1000 2.62 2.99 2.75 64 1000 2.93 3.36 3.07 128 1000 3.39 3.80 3.52 256 1000 4.37 4.79 4.51 512 1000 6.19 6.60 6.32 1024 1000 9.67 10.24 9.89 2048 1000 16.36 17.25 16.72 4096 1000 29.03 30.04 29.43 8192 1000 56.06 57.67 56.86 16384 1000 130.95 170.34 158.29 32768 1000 296.72 335.48 314.50 65536 640 344.49 346.64 345.40 131072 320 785.82 789.45 787.66 262144 160 1805.77 1815.16 1812.03 524288 80 4744.98 4771.08 4761.34 1048576 40 16590.94 16699.52 16636.40 2097152 20 52100.01 52548.39 52339.97 4194304 10 120354.49 121255.37 120966.97 #---------------------------------------------------------------- # Benchmarking Reduce_scatter_block # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.07 0.06 4 1000 0.55 0.56 0.56 8 1000 0.53 0.59 0.56 16 1000 0.52 0.52 0.52 32 1000 0.53 0.56 0.54 64 1000 0.60 0.62 0.61 128 1000 0.61 0.70 0.66 256 1000 0.65 0.66 0.66 512 1000 0.75 0.75 0.75 1024 1000 0.81 0.81 0.81 2048 1000 0.96 0.96 0.96 4096 1000 1.32 1.33 1.32 8192 1000 2.43 2.44 2.43 16384 1000 4.32 4.42 4.37 32768 1000 6.82 6.82 6.82 65536 640 12.23 12.23 12.23 131072 320 24.74 24.85 24.79 262144 160 24.92 27.47 26.19 524288 80 84.06 86.00 85.03 1048576 40 211.90 215.99 213.94 2097152 20 505.92 508.96 507.44 4194304 10 1027.65 1034.08 1030.86 #---------------------------------------------------------------- # Benchmarking Reduce_scatter_block # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.08 0.06 4 1000 1.01 1.12 1.05 8 1000 0.98 1.09 1.02 16 1000 0.96 1.11 1.02 32 1000 1.06 1.20 1.11 64 1000 1.05 1.15 1.08 128 1000 1.13 1.26 1.17 256 1000 1.21 1.41 1.28 512 1000 1.43 1.56 1.47 1024 1000 1.64 1.79 1.69 2048 1000 2.13 2.31 2.18 4096 1000 3.69 3.83 3.73 8192 1000 6.35 6.59 6.45 16384 1000 10.17 10.44 10.28 32768 1000 16.98 17.18 17.06 65536 640 33.31 33.72 33.46 131072 320 29.13 29.32 29.22 262144 160 74.68 79.75 77.62 524288 80 216.01 220.60 218.85 1048576 40 690.26 736.84 715.43 2097152 20 1441.95 1467.85 1455.34 4194304 10 3260.95 3299.14 3273.47 #---------------------------------------------------------------- # Benchmarking Reduce_scatter_block # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.08 0.06 4 1000 1.35 1.48 1.38 8 1000 1.37 1.55 1.43 16 1000 1.45 1.63 1.51 32 1000 1.51 1.65 1.56 64 1000 1.53 1.67 1.56 128 1000 1.67 1.84 1.72 256 1000 1.87 2.14 1.96 512 1000 2.20 2.43 2.27 1024 1000 2.84 3.05 2.90 2048 1000 4.37 4.65 4.48 4096 1000 7.41 7.83 7.58 8192 1000 11.97 12.30 12.11 16384 1000 20.05 20.41 20.17 32768 1000 37.90 38.35 38.07 65536 640 36.14 37.04 36.52 131072 320 75.76 77.78 76.78 262144 160 181.65 184.47 183.44 524288 80 418.13 422.71 420.33 1048576 40 1458.17 1527.03 1488.28 2097152 20 3981.13 4083.66 4038.44 4194304 10 8489.69 8571.89 8548.63 #---------------------------------------------------------------- # Benchmarking Reduce_scatter_block # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.67 0.10 4 1000 1.73 2.41 2.06 8 1000 1.81 2.44 2.12 16 1000 1.92 2.57 2.21 32 1000 1.90 2.55 2.22 64 1000 2.08 2.72 2.39 128 1000 2.27 2.97 2.62 256 1000 2.66 3.37 3.02 512 1000 3.37 4.06 3.73 1024 1000 5.06 5.75 5.41 2048 1000 8.25 9.11 8.69 4096 1000 13.18 14.31 13.75 8192 1000 22.26 23.33 22.83 16384 1000 42.14 43.57 42.94 32768 1000 43.72 44.42 44.01 65536 640 83.69 84.95 84.05 131072 320 223.59 229.64 226.66 262144 160 446.55 451.21 449.47 524288 80 1297.96 1303.36 1300.82 1048576 40 3986.19 4111.70 4052.13 2097152 20 11870.70 12029.66 11947.17 4194304 10 28413.14 28833.17 28638.82 #---------------------------------------------------------------- # Benchmarking Reduce_scatter_block # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.19 0.07 4 1000 2.28 2.61 2.39 8 1000 2.34 2.69 2.44 16 1000 2.42 2.75 2.53 32 1000 2.55 2.91 2.67 64 1000 2.83 3.23 2.95 128 1000 3.30 3.69 3.42 256 1000 4.27 4.66 4.39 512 1000 6.11 6.49 6.23 1024 1000 9.54 10.13 9.76 2048 1000 16.30 17.19 16.65 4096 1000 29.20 30.27 29.63 8192 1000 56.18 57.76 56.99 16384 1000 53.96 54.37 54.13 32768 1000 112.16 113.67 112.87 65536 640 344.68 346.93 345.56 131072 320 785.16 788.69 786.80 262144 160 1823.81 1829.46 1827.28 524288 80 4763.69 4787.56 4778.88 1048576 40 16594.68 16694.78 16635.01 2097152 20 51662.88 52149.08 51936.07 4194304 10 120448.75 121202.48 120943.86 #---------------------------------------------------------------- # Benchmarking Allgather # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.08 0.08 0.08 1 1000 0.45 0.47 0.46 2 1000 0.45 0.46 0.46 4 1000 0.45 0.47 0.46 8 1000 0.45 0.47 0.46 16 1000 0.47 0.47 0.47 32 1000 0.45 0.47 0.46 64 1000 0.47 0.51 0.49 128 1000 0.50 0.54 0.52 256 1000 0.53 0.54 0.53 512 1000 0.65 0.65 0.65 1024 1000 0.72 0.72 0.72 2048 1000 0.84 0.89 0.87 4096 1000 1.08 1.12 1.10 8192 1000 2.02 2.06 2.04 16384 1000 3.28 3.31 3.30 32768 1000 4.09 4.12 4.11 65536 640 7.38 7.38 7.38 131072 320 13.66 13.67 13.67 262144 160 27.35 27.37 27.36 524288 80 73.01 73.03 73.02 1048576 40 145.25 145.40 145.33 2097152 20 267.76 267.90 267.83 4194304 10 534.15 534.35 534.25 #---------------------------------------------------------------- # Benchmarking Allgather # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.12 0.08 1 1000 0.86 0.96 0.89 2 1000 0.82 0.91 0.85 4 1000 0.86 0.96 0.90 8 1000 0.84 0.95 0.89 16 1000 0.84 0.97 0.90 32 1000 0.90 1.02 0.96 64 1000 0.93 0.96 0.94 128 1000 0.95 1.04 0.99 256 1000 1.16 1.23 1.19 512 1000 1.30 1.35 1.32 1024 1000 1.47 1.51 1.49 2048 1000 1.86 1.91 1.88 4096 1000 2.85 3.00 2.90 8192 1000 4.39 4.49 4.42 16384 1000 6.63 6.92 6.80 32768 1000 10.28 10.49 10.36 65536 640 18.97 19.31 19.12 131072 320 36.62 37.04 36.80 262144 160 86.10 87.75 86.92 524288 80 194.37 199.12 197.33 1048576 40 408.02 438.26 425.75 2097152 20 827.23 848.48 839.29 4194304 10 1630.22 1660.48 1648.02 #---------------------------------------------------------------- # Benchmarking Allgather # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.20 0.10 1 1000 1.23 1.46 1.34 2 1000 1.31 1.54 1.40 4 1000 1.30 1.49 1.38 8 1000 1.30 1.49 1.38 16 1000 1.26 1.50 1.37 32 1000 1.29 1.55 1.41 64 1000 1.31 1.57 1.42 128 1000 1.54 1.79 1.66 256 1000 1.61 1.91 1.74 512 1000 2.05 2.23 2.12 1024 1000 2.38 2.71 2.52 2048 1000 3.59 3.89 3.74 4096 1000 5.03 5.33 5.16 8192 1000 7.83 8.25 8.05 16384 1000 12.60 13.09 12.88 32768 1000 22.39 23.06 22.64 65536 640 46.84 47.99 47.50 131072 320 91.62 93.27 92.67 262144 160 190.56 199.17 195.12 524288 80 379.83 399.22 391.46 1048576 40 735.73 789.78 766.15 2097152 20 1613.76 1736.22 1664.77 4194304 10 3588.18 3782.66 3693.36 #---------------------------------------------------------------- # Benchmarking Allgather # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.62 0.10 1 1000 1.68 2.35 1.95 2 1000 1.65 2.32 1.92 4 1000 1.64 2.32 1.91 8 1000 1.60 2.18 1.90 16 1000 1.69 2.23 1.95 32 1000 1.72 3.51 2.31 64 1000 1.92 2.55 2.19 128 1000 2.21 2.83 2.45 256 1000 2.56 3.10 2.79 512 1000 2.82 3.56 3.18 1024 1000 4.25 4.89 4.58 2048 1000 6.04 6.62 6.36 4096 1000 9.04 10.14 9.57 8192 1000 24.84 25.54 25.06 16384 1000 28.62 30.72 29.59 32768 1000 56.99 58.65 57.80 65536 640 112.60 120.77 117.58 131072 320 242.06 267.60 258.45 262144 160 712.70 718.83 716.37 524288 80 1946.60 1952.39 1949.19 1048576 40 4196.38 4229.09 4212.51 2097152 20 6657.25 6683.28 6671.79 4194304 10 13931.64 14055.97 13971.51 #---------------------------------------------------------------- # Benchmarking Allgather # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.21 0.09 1 1000 2.09 2.37 2.19 2 1000 2.12 2.36 2.21 4 1000 2.17 2.41 2.26 8 1000 2.21 2.44 2.30 16 1000 2.30 2.53 2.39 32 1000 2.43 2.67 2.50 64 1000 2.68 2.92 2.78 128 1000 3.18 3.45 3.29 256 1000 4.98 5.71 5.14 512 1000 5.25 5.57 5.43 1024 1000 11.43 12.27 11.64 2048 1000 17.89 18.96 18.13 4096 1000 30.31 31.57 30.63 8192 1000 52.66 54.58 53.23 16384 1000 101.66 114.39 105.09 32768 1000 162.71 165.40 164.23 65536 640 1083.99 1087.90 1086.48 131072 320 4421.70 4438.82 4428.04 262144 160 7929.09 7947.85 7936.85 524288 80 4287.01 4334.45 4306.23 1048576 40 10920.70 11521.18 11260.39 2097152 20 26770.29 26912.43 26835.77 4194304 10 53849.46 54028.63 53955.60 #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.09 0.09 0.09 1 1000 0.51 0.51 0.51 2 1000 0.51 0.56 0.53 4 1000 0.51 0.52 0.52 8 1000 0.51 0.55 0.53 16 1000 0.52 0.52 0.52 32 1000 0.51 0.55 0.53 64 1000 0.53 0.55 0.54 128 1000 0.54 0.57 0.56 256 1000 0.59 0.61 0.60 512 1000 0.69 0.69 0.69 1024 1000 0.78 0.78 0.78 2048 1000 0.86 0.87 0.87 4096 1000 1.14 1.15 1.15 8192 1000 2.02 2.02 2.02 16384 1000 3.55 3.55 3.55 32768 1000 5.40 5.42 5.41 65536 640 9.78 9.78 9.78 131072 320 18.68 18.74 18.71 262144 160 27.42 27.44 27.43 524288 80 72.90 72.92 72.91 1048576 40 145.15 145.30 145.23 2097152 20 267.29 267.60 267.44 4194304 10 534.24 534.57 534.40 #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.65 0.85 0.73 1 1000 0.91 1.12 0.98 2 1000 0.90 1.08 0.95 4 1000 0.90 1.04 0.94 8 1000 0.90 1.04 0.94 16 1000 0.90 1.04 0.94 32 1000 0.97 1.07 1.00 64 1000 0.93 1.08 0.98 128 1000 1.07 1.17 1.10 256 1000 1.19 1.35 1.24 512 1000 1.33 1.51 1.38 1024 1000 1.52 1.69 1.57 2048 1000 1.94 2.13 1.99 4096 1000 3.07 3.27 3.14 8192 1000 5.24 5.48 5.33 16384 1000 7.79 8.05 7.89 32768 1000 13.36 13.56 13.43 65536 640 25.01 25.58 25.23 131072 320 39.50 39.91 39.73 262144 160 86.13 87.85 87.02 524288 80 193.86 198.83 196.99 1048576 40 407.86 437.85 425.47 2097152 20 828.11 842.72 836.74 4194304 10 1631.94 1659.60 1647.15 #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.09 0.12 0.10 1 1000 1.41 1.66 1.49 2 1000 1.34 1.56 1.42 4 1000 1.34 1.59 1.44 8 1000 1.33 1.52 1.41 16 1000 1.38 1.59 1.46 32 1000 1.42 1.61 1.49 64 1000 1.41 1.62 1.50 128 1000 1.65 1.84 1.71 256 1000 1.88 2.11 1.97 512 1000 2.11 2.35 2.21 1024 1000 2.59 2.83 2.69 2048 1000 3.75 3.97 3.84 4096 1000 6.15 6.51 6.34 8192 1000 9.30 9.73 9.47 16384 1000 15.52 16.20 15.77 32768 1000 25.75 26.49 26.13 65536 640 46.80 47.92 47.44 131072 320 91.89 93.59 93.04 262144 160 190.37 199.29 195.19 524288 80 383.23 401.02 392.96 1048576 40 1018.60 1028.60 1023.08 2097152 20 1985.52 2004.96 1994.77 4194304 10 4079.13 4125.73 4102.12 #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.10 0.71 0.14 1 1000 1.80 2.42 2.11 2 1000 1.81 2.42 2.12 4 1000 1.82 2.45 2.14 8 1000 1.88 2.54 2.21 16 1000 1.93 2.52 2.22 32 1000 1.95 2.55 2.25 64 1000 2.04 2.69 2.40 128 1000 2.34 3.00 2.68 256 1000 2.72 3.47 3.09 512 1000 3.30 4.09 3.71 1024 1000 4.49 5.28 4.91 2048 1000 6.91 7.80 7.40 4096 1000 10.35 11.50 11.00 8192 1000 24.90 25.59 25.15 16384 1000 31.57 33.05 32.18 32768 1000 56.97 58.81 57.86 65536 640 113.14 121.25 118.09 131072 320 241.99 268.10 258.95 262144 160 709.12 722.16 717.75 524288 80 1924.83 1931.99 1928.26 1048576 40 4208.26 4225.89 4219.45 2097152 20 6661.18 6682.97 6673.04 4194304 10 13991.36 14041.73 14020.66 #---------------------------------------------------------------- # Benchmarking Allgatherv # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.13 0.19 0.14 1 1000 2.48 2.90 2.62 2 1000 2.48 2.87 2.61 4 1000 2.51 2.92 2.64 8 1000 2.56 2.96 2.70 16 1000 2.61 3.03 2.74 32 1000 2.72 3.18 2.87 64 1000 2.96 3.38 3.10 128 1000 3.44 3.87 3.59 256 1000 4.31 4.78 4.48 512 1000 5.68 6.28 5.91 1024 1000 8.17 8.76 8.40 2048 1000 13.06 13.97 13.53 4096 1000 28.06 28.88 28.58 8192 1000 56.44 57.33 56.79 16384 1000 105.73 106.85 106.09 32768 1000 173.08 176.19 174.84 65536 640 1099.54 1102.60 1101.22 131072 320 4423.62 4439.33 4431.92 262144 160 7927.18 7947.41 7936.42 524288 80 4270.70 4328.23 4301.72 1048576 40 10813.84 11215.55 10970.87 2097152 20 26794.21 26897.68 26851.45 4194304 10 53864.21 54017.10 53951.16 #---------------------------------------------------------------- # Benchmarking Gather # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.07 0.07 1 1000 0.23 0.34 0.28 2 1000 0.25 0.44 0.35 4 1000 0.25 0.45 0.35 8 1000 0.27 0.36 0.31 16 1000 0.27 0.44 0.36 32 1000 0.27 0.36 0.32 64 1000 0.27 0.37 0.32 128 1000 0.27 0.46 0.37 256 1000 0.29 0.45 0.37 512 1000 0.35 0.62 0.48 1024 1000 0.44 0.82 0.63 2048 1000 0.47 0.95 0.71 4096 1000 0.58 1.31 0.95 8192 1000 1.03 1.15 1.09 16384 1000 1.30 1.37 1.33 32768 1000 2.15 2.19 2.17 65536 640 3.62 3.68 3.65 131072 320 6.77 6.82 6.80 262144 160 13.21 13.32 13.26 524288 80 65.93 66.18 66.06 1048576 40 144.19 144.44 144.31 2097152 20 267.24 267.39 267.32 4194304 10 532.71 532.90 532.81 #---------------------------------------------------------------- # Benchmarking Gather # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.09 0.06 1 1000 0.21 0.74 0.43 2 1000 0.21 0.73 0.43 4 1000 0.21 0.74 0.43 8 1000 0.21 0.73 0.43 16 1000 0.21 0.74 0.43 32 1000 0.21 0.75 0.43 64 1000 0.21 0.75 0.44 128 1000 0.22 0.83 0.48 256 1000 0.23 1.16 0.61 512 1000 0.42 1.44 0.79 1024 1000 0.43 1.83 0.97 2048 1000 0.47 2.14 1.12 4096 1000 0.57 3.00 1.54 8192 1000 0.92 1.72 1.37 16384 1000 1.38 2.59 2.07 32768 1000 2.23 4.32 3.46 65536 640 3.76 7.26 5.84 131072 320 8.17 15.22 12.32 262144 160 40.93 64.94 51.93 524288 80 88.99 144.67 117.45 1048576 40 187.95 289.50 235.00 2097152 20 335.54 514.49 421.51 4194304 10 754.77 1024.88 840.75 #---------------------------------------------------------------- # Benchmarking Gather # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.09 0.06 1 1000 0.21 1.14 0.49 2 1000 0.21 1.16 0.51 4 1000 0.21 1.16 0.51 8 1000 0.21 1.21 0.52 16 1000 0.21 1.67 0.64 32 1000 0.21 1.20 0.52 64 1000 0.22 1.22 0.54 128 1000 0.23 1.55 0.59 256 1000 0.24 1.79 0.72 512 1000 0.42 2.26 0.91 1024 1000 0.61 2.72 0.90 2048 1000 0.66 3.67 1.07 4096 1000 0.93 5.55 1.56 8192 1000 1.42 3.62 2.41 16384 1000 1.96 5.32 3.56 32768 1000 3.17 8.87 5.92 65536 640 6.54 16.58 10.86 131072 320 22.43 65.32 43.12 262144 160 43.84 145.67 97.62 524288 80 78.12 290.98 195.43 1048576 40 219.28 580.26 389.64 2097152 20 316.17 1008.93 683.43 4194304 10 907.06 2126.23 1460.29 #---------------------------------------------------------------- # Benchmarking Gather # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.34 0.09 1 1000 0.21 1.38 0.49 2 1000 0.21 1.41 0.50 4 1000 0.21 1.45 0.50 8 1000 0.22 1.40 0.50 16 1000 0.21 1.42 0.50 32 1000 0.21 1.43 0.51 64 1000 0.22 1.67 0.53 128 1000 0.22 1.89 0.60 256 1000 0.23 2.29 0.70 512 1000 0.37 2.85 0.90 1024 1000 0.89 5.39 1.22 2048 1000 1.10 7.35 1.56 4096 1000 1.60 11.29 2.27 8192 1000 2.06 7.68 4.68 16384 1000 2.91 11.00 6.66 32768 1000 7.21 19.16 11.39 65536 640 22.70 68.92 39.35 131072 320 51.82 149.02 87.26 262144 160 94.33 294.97 173.64 524288 80 213.96 585.60 345.64 1048576 40 445.89 1191.07 701.58 2097152 20 810.20 2082.21 1245.20 4194304 10 2389.00 5127.53 3105.09 #---------------------------------------------------------------- # Benchmarking Gather # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.17 0.09 1 1000 0.21 1.95 0.56 2 1000 0.21 1.94 0.56 4 1000 0.21 1.95 0.56 8 1000 0.21 1.96 0.56 16 1000 0.21 1.97 0.57 32 1000 0.21 2.22 0.59 64 1000 0.22 2.45 0.61 128 1000 0.22 2.79 0.68 256 1000 0.23 3.38 0.78 512 1000 0.37 4.54 1.00 1024 1000 0.37 8.00 1.50 2048 1000 0.41 11.99 1.96 4096 1000 0.56 18.41 3.02 8192 1000 5.06 16.22 9.11 16384 1000 8.29 25.93 14.16 32768 1000 19.49 75.04 39.49 65536 640 39.98 156.28 84.42 131072 320 87.50 308.32 166.68 262144 160 162.17 593.78 324.53 524288 80 386.05 1215.12 662.90 1048576 40 1174.45 3471.55 1869.36 2097152 20 2069.26 5463.64 2970.32 4194304 10 4579.33 12496.27 6931.71 #---------------------------------------------------------------- # Benchmarking Gatherv # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.29 0.33 0.31 1 1000 0.52 0.71 0.61 2 1000 0.50 0.62 0.56 4 1000 0.54 0.59 0.57 8 1000 0.50 0.62 0.56 16 1000 0.50 0.63 0.56 32 1000 0.52 0.65 0.58 64 1000 0.54 0.61 0.58 128 1000 0.52 0.66 0.59 256 1000 0.52 0.66 0.59 512 1000 0.65 0.94 0.80 1024 1000 0.72 1.05 0.89 2048 1000 0.73 1.14 0.94 4096 1000 0.86 1.48 1.17 8192 1000 0.95 0.97 0.96 16384 1000 1.41 1.46 1.43 32768 1000 2.29 2.31 2.30 65536 640 3.76 3.76 3.76 131072 320 6.72 6.73 6.73 262144 160 13.59 13.67 13.63 524288 80 66.86 66.97 66.91 1048576 40 144.53 144.78 144.65 2097152 20 267.58 271.38 269.48 4194304 10 539.23 539.44 539.33 #---------------------------------------------------------------- # Benchmarking Gatherv # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.32 0.54 0.40 1 1000 0.52 1.15 0.74 2 1000 0.52 1.10 0.72 4 1000 0.53 1.10 0.72 8 1000 0.53 1.15 0.74 16 1000 0.53 1.12 0.73 32 1000 0.53 1.10 0.73 64 1000 0.53 1.11 0.73 128 1000 0.54 1.16 0.75 256 1000 0.55 1.15 0.76 512 1000 0.63 1.55 0.95 1024 1000 0.71 1.75 1.06 2048 1000 0.71 2.11 1.17 4096 1000 0.88 2.96 1.48 8192 1000 1.20 2.01 1.66 16384 1000 1.67 2.83 2.32 32768 1000 2.73 4.61 3.76 65536 640 4.40 7.56 6.15 131072 320 8.41 15.71 12.79 262144 160 35.69 69.85 56.74 524288 80 86.70 145.83 118.46 1048576 40 157.87 290.40 238.47 2097152 20 280.94 515.84 422.88 4194304 10 542.66 1039.39 852.55 #---------------------------------------------------------------- # Benchmarking Gatherv # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.45 0.70 0.56 1 1000 0.57 1.80 0.85 2 1000 0.57 1.85 0.87 4 1000 0.57 1.79 0.86 8 1000 0.57 1.87 0.87 16 1000 0.57 1.79 0.86 32 1000 0.57 1.87 0.87 64 1000 0.61 1.82 0.88 128 1000 0.62 1.82 0.89 256 1000 0.62 1.85 0.90 512 1000 0.81 2.77 1.19 1024 1000 0.92 3.21 1.34 2048 1000 0.96 4.10 1.48 4096 1000 1.15 5.98 1.92 8192 1000 1.55 4.08 2.82 16384 1000 2.38 5.74 3.93 32768 1000 3.66 9.37 6.36 65536 640 5.93 18.28 12.22 131072 320 22.82 70.06 47.02 262144 160 53.35 147.36 99.02 524288 80 74.79 292.33 196.46 1048576 40 153.21 581.36 390.49 2097152 20 292.34 1020.39 696.59 4194304 10 599.21 2128.42 1459.51 #---------------------------------------------------------------- # Benchmarking Gatherv # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.83 1.15 0.98 1 1000 1.04 3.79 1.46 2 1000 1.07 3.80 1.47 4 1000 1.08 3.80 1.47 8 1000 1.08 3.79 1.47 16 1000 1.03 3.79 1.46 32 1000 1.03 3.79 1.46 64 1000 1.03 3.80 1.47 128 1000 1.04 3.76 1.47 256 1000 1.05 3.88 1.49 512 1000 1.36 5.47 1.90 1024 1000 1.49 6.31 2.06 2048 1000 2.85 8.84 5.07 4096 1000 1.92 11.88 2.98 8192 1000 2.70 8.63 5.49 16384 1000 3.55 11.87 7.43 32768 1000 4.96 20.21 12.27 65536 640 18.94 75.00 44.28 131072 320 29.79 153.39 90.18 262144 160 48.83 298.84 176.61 524288 80 78.04 586.87 346.73 1048576 40 241.67 1198.09 706.06 2097152 20 336.39 2105.78 1269.53 4194304 10 1249.29 5117.01 3105.20 #---------------------------------------------------------------- # Benchmarking Gatherv # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.62 1.16 0.82 1 1000 1.03 5.05 1.83 2 1000 1.03 5.17 1.85 4 1000 1.06 5.23 1.87 8 1000 1.06 5.25 1.88 16 1000 1.05 5.31 1.87 32 1000 1.05 5.71 1.90 64 1000 1.07 6.02 1.96 128 1000 1.07 6.55 2.03 256 1000 1.11 7.50 2.16 512 1000 1.18 8.95 2.37 1024 1000 3.64 11.62 6.92 2048 1000 4.21 15.21 8.84 4096 1000 5.17 23.09 13.31 8192 1000 3.55 18.85 10.84 16384 1000 4.87 28.83 16.08 32768 1000 16.54 80.72 43.97 65536 640 22.94 162.32 88.80 131072 320 38.63 308.00 168.69 262144 160 43.33 598.90 327.73 524288 80 168.38 1290.82 704.41 1048576 40 893.62 3487.48 1876.31 2097152 20 1424.80 5448.93 2977.56 4194304 10 3689.26 12509.52 6919.97 #---------------------------------------------------------------- # Benchmarking Scatter # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.07 0.07 1 1000 0.29 0.34 0.32 2 1000 0.29 0.34 0.32 4 1000 0.28 0.35 0.31 8 1000 0.30 0.34 0.32 16 1000 0.28 0.35 0.32 32 1000 0.29 0.35 0.32 64 1000 0.30 0.42 0.36 128 1000 0.26 0.40 0.33 256 1000 0.26 0.40 0.33 512 1000 0.31 0.54 0.42 1024 1000 0.35 0.63 0.49 2048 1000 0.31 0.71 0.51 4096 1000 0.39 1.01 0.70 8192 1000 0.62 0.72 0.67 16384 1000 0.72 0.90 0.81 32768 1000 2.18 2.25 2.22 65536 640 3.66 3.72 3.69 131072 320 6.63 6.69 6.66 262144 160 12.55 12.62 12.58 524288 80 29.55 29.70 29.63 1048576 40 135.99 136.32 136.15 2097152 20 274.24 274.57 274.40 4194304 10 503.14 503.64 503.39 #---------------------------------------------------------------- # Benchmarking Scatter # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.32 0.45 0.39 1 1000 0.33 0.45 0.38 2 1000 0.30 0.45 0.38 4 1000 0.30 0.45 0.37 8 1000 0.33 0.49 0.42 16 1000 0.30 0.45 0.38 32 1000 0.30 0.45 0.38 64 1000 0.31 0.46 0.39 128 1000 0.33 0.65 0.46 256 1000 0.33 0.55 0.46 512 1000 0.52 1.03 0.82 1024 1000 0.72 1.42 1.06 2048 1000 0.86 1.91 1.43 4096 1000 1.10 2.18 1.59 8192 1000 0.57 1.05 0.72 16384 1000 0.67 1.20 0.90 32768 1000 1.99 2.49 2.21 65536 640 3.45 3.96 3.68 131072 320 6.43 6.93 6.65 262144 160 12.34 12.88 12.58 524288 80 27.37 29.11 28.25 1048576 40 133.23 138.31 136.13 2097152 20 251.64 264.62 259.95 4194304 10 506.79 533.33 523.76 #---------------------------------------------------------------- # Benchmarking Scatter # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.20 0.10 1 1000 0.50 0.74 0.61 2 1000 0.46 0.73 0.59 4 1000 0.46 0.73 0.59 8 1000 0.46 0.76 0.60 16 1000 0.46 0.74 0.59 32 1000 0.46 0.75 0.60 64 1000 0.49 0.77 0.62 128 1000 0.60 0.97 0.76 256 1000 0.80 1.37 1.07 512 1000 0.99 1.71 1.37 1024 1000 1.17 2.27 1.81 2048 1000 0.82 2.60 1.84 4096 1000 1.19 3.95 2.73 8192 1000 0.54 1.89 0.88 16384 1000 0.72 2.15 1.14 32768 1000 2.06 3.11 2.43 65536 640 3.53 4.70 3.90 131072 320 6.50 7.54 6.87 262144 160 12.46 13.53 12.85 524288 80 28.09 31.92 29.83 1048576 40 132.78 137.41 136.02 2097152 20 257.95 281.92 270.06 4194304 10 547.31 591.26 570.19 #---------------------------------------------------------------- # Benchmarking Scatter # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.06 0.11 0.06 1 1000 0.56 1.03 0.76 2 1000 0.52 1.02 0.75 4 1000 0.52 1.02 0.75 8 1000 0.51 1.03 0.75 16 1000 0.55 1.04 0.77 32 1000 0.54 1.06 0.78 64 1000 0.67 1.18 0.89 128 1000 0.82 1.62 1.20 256 1000 1.03 2.02 1.57 512 1000 1.32 2.65 2.13 1024 1000 1.48 2.63 2.18 2048 1000 0.82 5.10 3.12 4096 1000 1.22 7.54 4.69 8192 1000 0.57 3.41 1.38 16384 1000 0.89 3.93 1.80 32768 1000 2.04 4.84 2.95 65536 640 3.53 5.79 4.41 131072 320 6.56 8.77 7.42 262144 160 13.60 15.87 14.46 524288 80 27.06 31.05 28.77 1048576 40 133.23 179.62 151.05 2097152 20 331.60 376.01 346.60 4194304 10 961.10 997.82 981.86 #---------------------------------------------------------------- # Benchmarking Scatter # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.31 3.58 1.64 1 1000 0.61 1.18 0.85 2 1000 0.61 1.15 0.85 4 1000 0.61 1.17 0.85 8 1000 0.62 1.18 0.87 16 1000 0.63 1.19 0.88 32 1000 0.82 1.45 1.10 64 1000 1.01 1.78 1.36 128 1000 1.25 2.25 1.74 256 1000 1.57 2.82 2.29 512 1000 1.87 3.01 2.53 1024 1000 2.04 4.23 3.16 2048 1000 2.91 7.52 4.99 4096 1000 4.33 12.72 8.01 8192 1000 0.66 7.08 2.45 16384 1000 0.94 7.48 2.84 32768 1000 2.05 8.36 3.94 65536 640 3.52 9.63 5.42 131072 320 6.51 11.88 8.42 262144 160 12.47 17.46 14.32 524288 80 28.48 37.26 31.67 1048576 40 196.65 249.58 211.47 2097152 20 614.12 649.13 636.19 4194304 10 1870.06 1922.69 1899.20 #---------------------------------------------------------------- # Benchmarking Scatterv # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.29 0.38 0.33 1 1000 0.43 0.57 0.50 2 1000 0.42 0.61 0.52 4 1000 0.47 0.57 0.52 8 1000 0.43 0.54 0.49 16 1000 0.43 0.65 0.54 32 1000 0.44 0.55 0.50 64 1000 0.43 0.70 0.56 128 1000 0.42 0.64 0.53 256 1000 0.46 0.67 0.56 512 1000 0.45 0.75 0.60 1024 1000 0.47 0.79 0.63 2048 1000 0.48 0.94 0.71 4096 1000 0.58 1.18 0.88 8192 1000 0.88 1.02 0.95 16384 1000 0.92 1.19 1.06 32768 1000 1.57 2.43 2.00 65536 640 2.39 3.96 3.18 131072 320 3.86 6.91 5.38 262144 160 6.81 12.86 9.83 524288 80 15.67 30.76 23.22 1048576 40 66.87 136.52 101.69 2097152 20 120.84 263.36 192.10 4194304 10 217.12 503.95 360.54 #---------------------------------------------------------------- # Benchmarking Scatterv # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.32 0.58 0.43 1 1000 0.57 0.90 0.73 2 1000 0.53 0.91 0.72 4 1000 0.54 0.81 0.70 8 1000 0.54 0.81 0.71 16 1000 0.60 0.82 0.72 32 1000 0.57 0.85 0.74 64 1000 0.63 0.82 0.74 128 1000 0.64 0.85 0.76 256 1000 0.65 0.87 0.78 512 1000 0.75 1.18 1.00 1024 1000 0.86 1.34 1.11 2048 1000 0.99 1.69 1.34 4096 1000 1.34 2.39 1.83 8192 1000 0.83 1.44 1.10 16384 1000 1.02 1.71 1.27 32768 1000 1.66 2.92 2.06 65536 640 2.35 4.38 2.97 131072 320 3.83 7.34 4.81 262144 160 6.79 13.27 8.50 524288 80 14.06 29.54 18.20 1048576 40 64.18 136.87 84.27 2097152 20 111.11 265.22 154.14 4194304 10 222.49 534.75 310.47 #---------------------------------------------------------------- # Benchmarking Scatterv # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.42 0.67 0.53 1 1000 0.67 1.33 0.93 2 1000 0.68 1.34 0.94 4 1000 0.64 1.33 0.93 8 1000 0.68 1.35 0.94 16 1000 0.68 1.31 0.92 32 1000 0.81 1.62 1.21 64 1000 0.84 1.66 1.24 128 1000 0.65 1.35 0.96 256 1000 0.64 1.46 1.02 512 1000 1.37 2.30 1.89 1024 1000 0.97 2.49 1.82 2048 1000 1.15 2.95 2.16 4096 1000 1.54 4.33 3.13 8192 1000 0.91 2.25 1.25 16384 1000 1.18 2.67 1.62 32768 1000 2.42 3.51 2.81 65536 640 3.90 4.97 4.27 131072 320 6.87 7.96 7.25 262144 160 12.87 13.99 13.28 524288 80 27.12 31.36 29.35 1048576 40 132.67 141.05 137.33 2097152 20 255.30 280.38 267.45 4194304 10 552.58 594.84 574.28 #---------------------------------------------------------------- # Benchmarking Scatterv # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.80 1.14 0.93 1 1000 1.19 2.85 1.78 2 1000 1.12 2.84 1.76 4 1000 1.12 2.83 1.75 8 1000 1.16 2.84 1.76 16 1000 1.70 2.12 1.88 32 1000 1.75 2.19 1.95 64 1000 1.75 2.20 1.95 128 1000 1.15 2.79 1.78 256 1000 1.18 3.09 1.93 512 1000 2.63 3.41 3.07 1024 1000 1.28 4.99 3.15 2048 1000 1.44 5.67 3.75 4096 1000 1.89 8.18 5.40 8192 1000 1.18 4.30 2.03 16384 1000 1.68 4.91 2.56 32768 1000 2.86 5.84 3.75 65536 640 4.31 6.77 5.15 131072 320 7.38 9.70 8.21 262144 160 13.29 15.70 14.19 524288 80 27.57 33.82 29.30 1048576 40 136.07 183.69 154.02 2097152 20 332.72 376.17 347.14 4194304 10 958.93 999.35 982.35 #---------------------------------------------------------------- # Benchmarking Scatterv # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.63 1.08 0.80 1 1000 1.59 2.61 2.14 2 1000 1.60 2.64 2.15 4 1000 1.60 2.66 2.16 8 1000 1.61 2.68 2.18 16 1000 1.63 2.64 2.18 32 1000 1.76 2.81 2.32 64 1000 1.85 2.93 2.43 128 1000 0.93 4.42 2.35 256 1000 2.57 4.15 3.37 512 1000 2.58 4.34 3.50 1024 1000 3.08 5.38 4.24 2048 1000 1.40 11.11 6.30 4096 1000 1.90 15.16 8.96 8192 1000 1.32 7.89 3.16 16384 1000 1.62 8.22 3.52 32768 1000 2.73 9.15 4.61 65536 640 4.20 10.19 6.06 131072 320 7.18 12.58 9.06 262144 160 13.15 18.30 15.00 524288 80 29.74 36.98 32.77 1048576 40 198.12 251.28 213.00 2097152 20 618.50 653.68 639.95 4194304 10 1878.78 1929.33 1904.19 #---------------------------------------------------------------- # Benchmarking Alltoall # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.11 0.11 0.11 1 1000 0.81 0.82 0.81 2 1000 0.81 0.82 0.81 4 1000 0.85 0.85 0.85 8 1000 0.81 0.81 0.81 16 1000 0.80 0.81 0.81 32 1000 0.85 0.85 0.85 64 1000 0.83 0.86 0.84 128 1000 0.84 0.85 0.85 256 1000 0.86 0.91 0.88 512 1000 1.00 1.00 1.00 1024 1000 1.06 1.06 1.06 2048 1000 1.21 1.21 1.21 4096 1000 1.45 1.45 1.45 8192 1000 1.44 1.45 1.45 16384 1000 1.91 1.92 1.92 32768 1000 2.84 2.88 2.86 65536 640 4.32 4.36 4.34 131072 320 7.31 7.36 7.34 262144 160 16.27 16.29 16.28 524288 80 70.22 70.26 70.24 1048576 40 145.21 145.58 145.40 2097152 20 266.17 268.51 267.34 4194304 10 526.85 536.05 531.45 #---------------------------------------------------------------- # Benchmarking Alltoall # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.07 0.09 0.08 1 1000 1.38 1.56 1.47 2 1000 1.39 1.57 1.48 4 1000 1.42 1.56 1.50 8 1000 1.43 1.55 1.49 16 1000 1.41 1.53 1.49 32 1000 1.37 1.49 1.46 64 1000 1.40 1.55 1.49 128 1000 1.41 1.54 1.50 256 1000 1.48 1.64 1.56 512 1000 1.96 2.19 2.04 1024 1000 2.07 2.23 2.15 2048 1000 2.54 2.90 2.73 4096 1000 3.37 3.52 3.42 8192 1000 2.97 3.02 3.00 16384 1000 3.74 3.81 3.78 32768 1000 5.49 5.65 5.57 65536 640 8.49 8.61 8.54 131072 320 16.64 17.43 16.96 262144 160 86.84 109.37 100.04 524288 80 181.58 253.65 220.28 1048576 40 361.90 466.22 423.02 2097152 20 642.86 684.97 665.03 4194304 10 1577.05 1655.69 1626.07 #---------------------------------------------------------------- # Benchmarking Alltoall # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.08 0.10 0.08 1 1000 2.43 2.67 2.51 2 1000 2.44 2.64 2.50 4 1000 2.44 2.65 2.51 8 1000 2.44 2.66 2.51 16 1000 2.64 2.79 2.75 32 1000 2.65 2.87 2.78 64 1000 2.72 2.86 2.79 128 1000 2.74 2.91 2.83 256 1000 2.88 3.07 2.95 512 1000 3.83 4.11 3.93 1024 1000 4.22 4.52 4.41 2048 1000 5.20 5.51 5.36 4096 1000 7.12 7.48 7.29 8192 1000 5.68 6.13 5.85 16384 1000 7.21 7.61 7.36 32768 1000 10.92 11.53 11.04 65536 640 18.90 21.56 20.58 131072 320 75.66 95.20 84.86 262144 160 151.37 191.79 170.96 524288 80 445.85 466.85 455.68 1048576 40 1327.82 1380.18 1361.06 2097152 20 2066.06 2121.91 2081.35 4194304 10 4364.31 4506.79 4425.43 #---------------------------------------------------------------- # Benchmarking Alltoall # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.08 0.14 0.09 1 1000 3.52 3.76 3.58 2 1000 3.53 4.07 3.65 4 1000 3.55 3.78 3.61 8 1000 3.58 3.81 3.63 16 1000 3.64 3.85 3.69 32 1000 3.76 4.03 3.83 64 1000 5.31 5.79 5.49 128 1000 5.30 5.73 5.49 256 1000 5.63 6.01 5.76 512 1000 7.47 7.94 7.74 1024 1000 8.74 9.10 8.86 2048 1000 10.82 11.15 10.95 4096 1000 14.80 15.22 15.00 8192 1000 10.89 11.82 11.15 16384 1000 14.06 15.26 14.42 32768 1000 23.30 26.24 24.60 65536 640 89.72 125.98 108.32 131072 320 259.98 275.16 267.29 262144 160 979.88 1001.42 989.23 524288 80 2338.89 2381.23 2356.82 1048576 40 5012.85 5081.45 5048.88 2097152 20 7019.98 7116.96 7064.49 4194304 10 14137.83 14407.94 14286.12 #---------------------------------------------------------------- # Benchmarking Alltoall # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.09 0.19 0.10 1 1000 5.30 5.76 5.44 2 1000 5.31 5.79 5.47 4 1000 5.33 5.79 5.50 8 1000 5.41 5.85 5.58 16 1000 5.59 6.05 5.78 32 1000 6.06 6.61 6.30 64 1000 6.50 7.05 6.75 128 1000 7.37 7.96 7.63 256 1000 9.29 9.90 9.54 512 1000 14.89 15.50 15.09 1024 1000 18.20 18.92 18.44 2048 1000 22.11 22.90 22.35 4096 1000 30.37 31.34 30.63 8192 1000 23.00 24.79 23.44 16384 1000 44.12 45.94 44.70 32768 1000 199.20 205.00 201.82 65536 640 665.07 668.48 666.54 131072 320 1605.10 1614.19 1609.54 262144 160 3357.89 3425.84 3384.14 524288 80 6909.90 7002.08 6956.78 1048576 40 14003.20 14271.21 14123.60 2097152 20 26861.92 27040.47 26947.49 4194304 10 53797.07 54157.18 53972.40 #---------------------------------------------------------------- # Benchmarking Alltoallv # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.69 0.78 0.73 1 1000 1.34 1.49 1.42 2 1000 1.33 1.49 1.41 4 1000 1.33 1.48 1.40 8 1000 1.33 1.49 1.41 16 1000 1.35 1.48 1.42 32 1000 1.33 1.49 1.41 64 1000 1.39 1.51 1.45 128 1000 1.43 1.55 1.49 256 1000 1.47 1.58 1.53 512 1000 1.62 1.65 1.64 1024 1000 1.70 1.71 1.70 2048 1000 1.82 1.83 1.83 4096 1000 2.08 2.10 2.09 8192 1000 2.07 2.23 2.15 16384 1000 2.53 2.59 2.56 32768 1000 3.46 3.60 3.53 65536 640 4.90 5.07 4.99 131072 320 7.93 8.05 7.99 262144 160 17.28 17.32 17.30 524288 80 72.74 72.76 72.75 1048576 40 146.22 146.35 146.29 2097152 20 269.39 269.53 269.46 4194304 10 540.18 540.33 540.25 #---------------------------------------------------------------- # Benchmarking Alltoallv # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 1.28 1.44 1.33 1 1000 2.65 2.80 2.73 2 1000 2.66 2.81 2.73 4 1000 2.64 2.80 2.72 8 1000 2.65 2.79 2.74 16 1000 2.67 2.83 2.76 32 1000 2.64 2.79 2.73 64 1000 2.79 2.98 2.88 128 1000 2.71 2.91 2.81 256 1000 2.77 3.00 2.88 512 1000 3.25 3.42 3.31 1024 1000 3.42 3.64 3.50 2048 1000 3.88 4.11 3.96 4096 1000 4.69 4.95 4.79 8192 1000 4.10 4.34 4.21 16384 1000 4.90 5.09 4.97 32768 1000 6.66 6.94 6.78 65536 640 9.62 9.92 9.75 131072 320 17.55 19.11 18.43 262144 160 89.84 113.24 103.59 524288 80 181.94 239.47 217.35 1048576 40 361.93 466.20 423.42 2097152 20 663.05 694.15 684.22 4194304 10 1581.99 1679.61 1634.95 #---------------------------------------------------------------- # Benchmarking Alltoallv # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 1.91 2.13 1.99 1 1000 4.63 4.99 4.84 2 1000 4.62 4.94 4.80 4 1000 4.63 4.95 4.82 8 1000 4.62 4.91 4.78 16 1000 4.65 4.95 4.83 32 1000 4.61 4.91 4.80 64 1000 4.67 4.96 4.85 128 1000 4.68 4.99 4.87 256 1000 4.79 5.07 4.96 512 1000 5.83 6.07 5.91 1024 1000 6.34 6.62 6.44 2048 1000 7.28 7.67 7.42 4096 1000 9.24 9.79 9.42 8192 1000 7.62 8.15 7.87 16384 1000 9.20 9.68 9.47 32768 1000 12.85 13.67 13.17 65536 640 21.06 24.17 23.08 131072 320 78.35 98.63 87.48 262144 160 155.82 195.88 174.36 524288 80 453.46 472.07 462.79 1048576 40 1338.01 1370.33 1361.04 2097152 20 2049.47 2134.37 2086.90 4194304 10 4495.69 4663.08 4572.86 #---------------------------------------------------------------- # Benchmarking Alltoallv # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 2.40 2.84 2.59 1 1000 7.64 8.39 8.05 2 1000 7.63 8.36 8.04 4 1000 7.62 8.38 8.04 8 1000 7.62 8.39 8.03 16 1000 7.63 8.35 8.05 32 1000 7.70 8.49 8.10 64 1000 7.66 8.44 8.13 128 1000 7.75 8.50 8.18 256 1000 8.01 8.87 8.42 512 1000 10.50 10.76 10.60 1024 1000 11.40 11.83 11.55 2048 1000 13.62 14.12 13.79 4096 1000 17.60 18.25 17.82 8192 1000 13.36 14.53 13.95 16384 1000 16.62 18.38 17.30 32768 1000 27.02 29.72 28.30 65536 640 93.46 130.78 112.70 131072 320 266.72 279.39 271.78 262144 160 978.57 1004.32 993.46 524288 80 2320.86 2376.53 2361.03 1048576 40 4995.32 5094.94 5054.47 2097152 20 7036.53 7158.08 7109.96 4194304 10 14150.87 14457.80 14337.19 #---------------------------------------------------------------- # Benchmarking Alltoallv # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 3.21 3.61 3.35 1 1000 14.34 15.41 14.76 2 1000 14.18 15.13 14.66 4 1000 14.17 15.05 14.64 8 1000 14.28 15.05 14.65 16 1000 14.23 14.96 14.58 32 1000 14.30 15.07 14.68 64 1000 14.69 15.56 15.10 128 1000 15.10 15.95 15.50 256 1000 15.39 16.40 15.90 512 1000 18.79 19.51 19.08 1024 1000 22.25 23.15 22.54 2048 1000 26.73 27.71 27.02 4096 1000 35.33 36.61 35.75 8192 1000 28.24 30.63 29.19 16384 1000 47.99 49.88 48.67 32768 1000 202.51 208.14 204.98 65536 640 672.63 679.08 675.63 131072 320 1605.37 1612.19 1607.48 262144 160 3372.22 3435.50 3399.55 524288 80 6926.72 7009.40 6966.64 1048576 40 14000.49 14202.22 14097.42 2097152 20 26872.24 27053.03 26959.94 4194304 10 53754.45 54187.94 54003.01 #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.03 0.03 1 1000 0.25 0.39 0.32 2 1000 0.28 0.40 0.34 4 1000 0.27 0.39 0.33 8 1000 0.22 0.33 0.28 16 1000 0.20 0.31 0.25 32 1000 0.24 0.37 0.30 64 1000 0.23 0.38 0.31 128 1000 0.24 0.39 0.31 256 1000 0.26 0.44 0.35 512 1000 0.55 0.63 0.59 1024 1000 0.66 0.70 0.68 2048 1000 0.86 0.89 0.88 4096 1000 1.12 1.18 1.15 8192 1000 1.63 1.70 1.67 16384 1000 2.90 2.99 2.95 32768 1000 1.07 1.10 1.08 65536 640 2.05 2.14 2.09 131072 320 3.55 3.77 3.66 262144 160 6.85 7.23 7.04 524288 80 15.43 16.27 15.85 1048576 40 64.83 65.19 65.01 2097152 20 107.96 108.39 108.18 4194304 10 267.25 267.76 267.51 #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.02 0.02 1 1000 0.35 0.63 0.48 2 1000 0.39 0.66 0.51 4 1000 0.35 0.63 0.47 8 1000 0.38 0.47 0.41 16 1000 0.33 0.42 0.36 32 1000 0.33 0.42 0.37 64 1000 0.30 0.56 0.40 128 1000 0.18 0.44 0.26 256 1000 0.29 0.60 0.38 512 1000 0.35 0.64 0.44 1024 1000 0.44 0.73 0.52 2048 1000 0.61 0.87 0.68 4096 1000 0.90 1.19 0.98 8192 1000 1.43 1.71 1.51 16384 1000 2.65 2.91 2.73 32768 1000 0.78 1.11 0.87 65536 640 1.62 1.90 1.70 131072 320 3.14 3.51 3.24 262144 160 6.15 6.41 6.22 524288 80 12.43 14.19 13.26 1048576 40 47.56 53.69 50.86 2097152 20 115.81 127.15 122.63 4194304 10 237.13 262.37 252.87 #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.03 0.03 1 1000 0.45 0.71 0.58 2 1000 0.46 0.72 0.58 4 1000 0.45 0.71 0.58 8 1000 0.39 0.67 0.51 16 1000 0.39 0.69 0.52 32 1000 0.32 0.53 0.40 64 1000 0.16 0.50 0.22 128 1000 0.18 0.52 0.25 256 1000 0.22 0.61 0.34 512 1000 0.30 0.70 0.43 1024 1000 0.37 0.77 0.50 2048 1000 0.55 0.93 0.66 4096 1000 0.88 1.25 0.99 8192 1000 1.36 1.76 1.48 16384 1000 2.53 2.92 2.65 32768 1000 0.76 1.21 0.84 65536 640 1.59 2.00 1.67 131072 320 3.12 3.50 3.19 262144 160 6.14 6.78 6.26 524288 80 12.09 14.54 13.48 1048576 40 48.55 57.12 51.26 2097152 20 114.36 129.35 119.90 4194304 10 240.25 266.66 251.39 #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.03 0.03 1 1000 0.76 1.08 0.90 2 1000 0.82 1.12 0.95 4 1000 0.80 1.09 0.94 8 1000 0.73 1.05 0.86 16 1000 0.36 0.68 0.53 32 1000 0.36 0.67 0.51 64 1000 0.72 1.03 0.87 128 1000 0.28 0.96 0.53 256 1000 0.38 1.05 0.63 512 1000 0.43 1.14 0.70 1024 1000 0.53 1.26 0.78 2048 1000 0.69 1.38 0.95 4096 1000 1.06 1.82 1.31 8192 1000 1.61 2.29 1.86 16384 1000 2.87 3.61 3.14 32768 1000 0.88 1.69 1.15 65536 640 1.70 2.46 1.96 131072 320 3.21 3.92 3.45 262144 160 6.17 6.81 6.34 524288 80 12.28 15.06 13.73 1048576 40 48.91 62.43 53.80 2097152 20 111.25 144.41 125.69 4194304 10 399.40 431.57 412.36 #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 32 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.03 0.03 1 1000 0.69 1.13 0.84 2 1000 0.63 1.11 0.81 4 1000 0.69 1.11 0.83 8 1000 0.56 1.07 0.77 16 1000 0.26 0.74 0.46 32 1000 0.28 0.74 0.48 64 1000 0.21 0.84 0.77 128 1000 0.16 0.72 0.24 256 1000 0.19 0.85 0.32 512 1000 0.27 0.94 0.44 1024 1000 0.31 1.03 0.51 2048 1000 0.53 1.26 0.72 4096 1000 0.87 1.55 1.05 8192 1000 1.45 2.13 1.62 16384 1000 2.83 3.56 3.02 32768 1000 0.76 1.50 0.86 65536 640 1.59 2.29 1.69 131072 320 3.14 3.81 3.24 262144 160 6.10 6.82 6.20 524288 80 12.09 15.28 13.32 1048576 40 68.26 87.57 74.08 2097152 20 208.15 229.47 216.98 4194304 10 723.35 781.85 757.97 #--------------------------------------------------- # Benchmarking Barrier # #processes = 2 # ( 30 additional processes waiting in MPI_Barrier) #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 0.44 0.44 0.44 #--------------------------------------------------- # Benchmarking Barrier # #processes = 4 # ( 28 additional processes waiting in MPI_Barrier) #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 0.37 0.37 0.37 #--------------------------------------------------- # Benchmarking Barrier # #processes = 8 # ( 24 additional processes waiting in MPI_Barrier) #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 0.62 0.62 0.62 #--------------------------------------------------- # Benchmarking Barrier # #processes = 16 # ( 16 additional processes waiting in MPI_Barrier) #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 0.73 0.73 0.73 #--------------------------------------------------- # Benchmarking Barrier # #processes = 32 #--------------------------------------------------- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 0.95 0.95 0.95 # All processes entering MPI_Finalize ------------------------------------------------------------ Sender: LSF System Subject: Job 19222656: in cluster Done Job was submitted from host by user in cluster at Wed Mar 15 09:37:51 2023 Job was executed on host(s) <32*sc1nc068is12>, in queue , as user in cluster at Wed Mar 15 09:37:51 2023 was used as the home directory. was used as the working directory. Started at Wed Mar 15 09:37:51 2023 Terminated at Wed Mar 15 09:39:45 2023 Results reported at Wed Mar 15 09:39:45 2023 Your job looked like: ------------------------------------------------------------ # LSBATCH: User input #!/bin/bash #BSUB -J fhibench #BSUB -n 32 #BSUB -q short #BSUB -R rusage[mem=4G] #BSUB -R span[hosts=1] #BSUB -R affinity[core(1):cpubind=core:membind=localonly:distribute=pack] #BSUB -o fhibench.o%J #BSUB -e fhibench.e%J export OMP_NUM_THREADS=1 export MKL_NUM_THREADS=1 export MKL_DYNAMIC=FALSE # export UCX_TLS=sm,rc_verbs,rc_mlx5_2,dc_verbs,dc_mlx5_2,ud_verbs,ud_mlx5_2,self export LD_PRELOAD=$I_MPI_ROOT/lib/libmpi_shm_heap_proxy.so export I_MPI_HYDRA_BOOTSTRAP=lsf export I_MPI_HYDRA_RMK=lsf export I_MPI_HYDRA_TOPOLIB=hwloc # export I_MPI_HYDRA_IFACE=ib0 # export FI_SOCKETS_IFACE=ib0 # export FI_PROVIDER_PATH=/projects/site/gred/smpg/software/oneAPI/2023/mpi/2021.8.0/libfabric/lib/prov:/usr/lib64/ # export FI_PROVIDER=mlx # export I_MPI_PLATFORM=clx-ap # export I_MPI_EXTRA_FILESYSTEM=1 # export I_MPI_EXTRA_FILESYSTEM_FORCE=gpfs # export I_MPI_FABRICS=shm:ofi # export I_MPI_SHM=clx-ap export I_MPI_SHM_HEAP=1 # export I_MPI_OFI_PROVIDER=mlx export I_MPI_PIN_CELL=core export I_MPI_DEBUG=6 # export FI_LOG_LEVEL=debug mpirun -np 32 IMB-MPI1 2>&1 | tee IMB.out ------------------------------------------------------------ Successfully completed. Resource usage summary: CPU time : 3564.00 sec. Max Memory : 19853 MB Average Memory : 11981.12 MB Total Requested Memory : 131072.00 MB Delta Memory : 111219.00 MB Max Swap : - Max Processes : 41 Max Threads : 75 Run time : 114 sec. Turnaround time : 114 sec. The output (if any) is above this job summary. PS: Read file for stderr output of this job.