Tue Apr 13 13:15:40 UTC 2021 [0] MPI startup(): Intel(R) MPI Library, Version 2021.2 Build 20210302 (id: f4f7c92cd) [0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation. All rights reserved. [0] MPI startup(): library kind: release [0] MPI startup(): libfabric version: 1.11.0-impi [0] MPI startup(): libfabric provider: mlx [0] MPI startup(): Load tuning file: "/opt/local/mpi/2021.2.0/etc/tuning_icx_shm-ofi_mlx.dat" [0] MPI startup(): Rank Pid Node name Pin cpu [0] MPI startup(): I_MPI_ROOT=/opt/local/mpi/2021.2.0 [0] MPI startup(): I_MPI_HYDRA_PMI_CONNECT=alltoall [0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc [0] MPI startup(): I_MPI_PIN=1 [0] MPI startup(): I_MPI_PIN_PROCESSOR_LIST=0-75 [0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default [0] MPI startup(): I_MPI_EXTRA_FILESYSTEM=on [0] MPI startup(): I_MPI_EXTRA_FILESYSTEM_FORCE=gpfs [0] MPI startup(): I_MPI_FABRICS=shm:ofi [0] MPI startup(): I_MPI_DEBUG=5 #---------------------------------------------------------------- # Intel(R) MPI Benchmarks 2021.2, MPI-1 part #---------------------------------------------------------------- # Date : Tue Apr 13 13:17:53 2021 # Machine : x86_64 # System : Linux # Release : 4.18.0-240.el8.x86_64 # Version : #1 SMP Fri Sep 25 19:48:47 UTC 2020 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # IMB-MPI1 Bcast Allreduce -npmin 67944 # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Bcast # Allreduce #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 67944 # ( 63384 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.02 0.30 0.03 1 1000 1.47 11.64 8.70 2 1000 1.37 12.62 8.88 4 1000 1.39 11.21 8.33 8 1000 1.36 12.46 8.90 16 1000 0.92 11.76 8.11 32 1000 0.97 11.57 8.03 64 1000 1.68 18.40 10.06 128 1000 1.11 16.95 10.66 256 1000 1.15 20.65 13.18 512 1000 1.17 21.68 13.59 1024 1000 7.15 19.87 15.31 2048 1000 3.20 52.43 38.89 4096 1000 2.47 30.16 23.26 8192 1000 3.20 33.18 27.81 16384 1000 8.19 45.55 31.54 32768 1000 12.46 59.65 48.58 65536 640 31.08 94.48 75.49 131072 320 51.22 152.07 119.02 262144 160 64.09 212.79 155.07 524288 80 125.72 409.47 294.34 1048576 40 2375.30 2434.73 2398.90 2097152 20 2972.18 3162.70 3044.61 4194304 10 5249.14 5573.12 5397.86 #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 131328 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.28 0.03 1 1000 1.38 12.30 9.49 2 1000 1.38 11.55 8.75 4 1000 1.37 11.65 8.91 8 1000 1.48 12.16 9.01 16 1000 0.91 183.22 8.26 32 1000 0.96 11.84 8.52 64 1000 1.45 14.36 10.37 128 1000 1.43 18.38 11.75 256 1000 1.48 20.87 13.32 512 1000 1.59 177.87 14.05 1024 1000 7.14 44.12 15.53 2048 1000 4.27 153.54 38.43 4096 1000 3.34 220.89 23.75 8192 1000 4.41 107.85 29.20 16384 1000 8.21 222.81 34.36 32768 1000 12.56 231.17 53.04 65536 640 31.15 103.46 80.29 131072 320 50.92 184.37 130.49 262144 160 64.32 278.93 178.00 524288 80 126.33 532.69 337.93 1048576 40 3875.30 4014.17 3933.26 2097152 20 4431.64 4725.84 4542.04 4194304 10 6780.43 26173.21 9928.21 #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 67944 # ( 63384 additional processes waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.27 0.03 4 1000 37.38 41.31 39.08 8 1000 34.65 37.80 36.14 16 1000 24.03 27.43 25.29 32 1000 21.73 25.83 23.53 64 1000 21.24 26.65 22.82 128 1000 22.68 27.95 24.29 256 1000 239.38 250.64 241.20 512 1000 32.37 36.84 33.37 1024 1000 30.14 35.95 31.98 2048 1000 46.17 51.07 48.01 4096 1000 54.16 59.39 55.76 8192 1000 63.11 71.08 65.54 16384 1000 85.32 98.13 90.04 32768 1000 304.70 319.96 312.32 65536 640 159.84 181.29 173.09 131072 320 260.25 287.17 278.03 262144 160 400.73 439.16 427.32 524288 80 662.66 723.59 706.27 1048576 40 1244.49 1350.82 1321.00 2097152 20 2686.96 2937.80 2838.01 4194304 10 5512.79 6120.22 5789.78 #---------------------------------------------------------------- # Benchmarking Allreduce # #processes = 131328 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.27 0.03 4 1000 31.84 35.64 34.23 8 1000 41.57 45.30 43.62 16 1000 62.64 295.87 170.28 32 1000 148.61 406.85 277.81 64 1000 1045.19 1368.68 1234.35 128 1000 27.46 32.09 29.54 256 1000 258.02 438.44 267.45 512 1000 43.44 48.31 44.63 1024 1000 35.01 40.58 37.53 2048 1000 45.32 50.36 46.73 4096 1000 59.62 64.98 61.58 8192 1000 79.23 86.79 82.28 16384 1000 110.34 122.29 116.57 32768 1000 177.31 189.65 184.17 65536 640 202.90 223.11 215.04 131072 320 265.20 291.08 281.39 262144 160 436.43 474.70 460.19 524288 80 716.42 780.55 761.06 1048576 40 1289.11 1403.62 1363.71 2097152 20 2727.26 2974.33 2878.05 4194304 10 5368.04 5801.71 5633.09 # All processes entering MPI_Finalize real 3m39.513s user 0m0.402s sys 0m0.938s Tue Apr 13 13:19:19 UTC 2021 Total elapsed time = 219 sec