$ cat /proc/cpuinfo | head -24 processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz stepping : 4 cpu MHz : 2394.372 cache size : 28160 KB physical id : 0 siblings : 40 core id : 0 cpu cores : 20 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm bogomips : 4788.74 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: $ mpirun -n 3 /vols/IntelMPI/intel/oneapi/mpi/2021.1.1/bin/IMB-MPI1 Bcast #------------------------------------------------------------ # Intel(R) MPI Benchmarks 2021.1, MPI-1 part #------------------------------------------------------------ # Date : Sun Apr 11 13:35:42 2021 # Machine : x86_64 # System : Linux # Release : 2.6.32-431.11.2.el6.x86_64 # Version : #1 SMP Mon Mar 3 13:32:45 EST 2014 # MPI Version : 3.1 # MPI Thread Environment: # Calling sequence was: # /vols/IntelMPI/intel/oneapi/mpi/2021.1.1/bin/IMB-MPI1 Bcast # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # Bcast #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 2 # ( 1 additional process waiting in MPI_Barrier) #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.04 0.03 1 1000 0.18 0.95 0.56 2 1000 0.18 0.94 0.56 4 1000 0.18 0.95 0.56 8 1000 0.18 0.95 0.56 16 1000 0.18 0.95 0.56 32 1000 0.18 0.96 0.57 64 1000 0.18 0.96 0.57 128 1000 0.18 0.96 0.57 256 1000 0.19 1.00 0.59 512 1000 0.35 1.15 0.75 1024 1000 0.61 1.31 0.96 2048 1000 0.56 1.33 0.94 4096 1000 2.82 3.02 2.92 8192 1000 3.76 3.86 3.81 16384 1000 5.04 5.11 5.08 32768 1000 5.16 6.95 6.06 65536 640 10.21 11.99 11.10 131072 320 20.48 22.95 21.72 262144 160 40.99 43.45 42.22 524288 80 82.27 84.70 83.49 1048576 40 189.68 193.68 191.68 2097152 20 334.30 427.86 381.08 4194304 10 665.11 852.09 758.60 #---------------------------------------------------------------- # Benchmarking Bcast # #processes = 3 #---------------------------------------------------------------- #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] 0 1000 0.03 0.04 0.03 1 1000 0.43 1.08 0.85 2 1000 0.42 1.09 0.85 4 1000 0.42 1.11 0.86 8 1000 0.49 0.80 0.63 16 1000 0.47 0.81 0.62 32 1000 0.58 1.14 0.94 64 1000 0.61 1.17 0.97 128 1000 0.61 1.17 0.96 256 1000 0.54 1.18 0.96 512 1000 0.53 1.17 0.83 1024 1000 1.49 2.16 1.72 2048 1000 1.58 2.29 1.85 4096 1000 2.29 3.42 2.80 8192 1000 3.41 4.93 3.92 16384 1000 5.78 6.70 6.29 =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 0 PID 210557 RUNNING AT arthur = KILLED BY SIGNAL: 4 (Illegal instruction) =================================================================================== =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 210558 RUNNING AT arthur = KILLED BY SIGNAL: 9 (Killed) =================================================================================== =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 2 PID 210559 RUNNING AT arthur = KILLED BY SIGNAL: 9 (Killed) =================================================================================== $ gdb /vols/IntelMPI/intel/oneapi/mpi/2021.1.1/bin/IMB-MPI1 core.210557 GNU gdb (GDB) 8.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /vols/IntelMPI/intel/oneapi/mpi/2021.1.1/bin/IMB-MPI1...(no debugging symbols found)...done. [New LWP 210557] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/vols/IntelMPI/intel/oneapi/mpi/2021.1'. Program terminated with signal SIGILL, Illegal instruction. #0 memcpy_min4096x3_max4096x4_unaligned_load_regular_store_avx512 (destination=, source=, size=) at ../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_avx512.c:3929 3929 ../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_avx512.c: No such file or directory. (gdb) bt #0 memcpy_min4096x3_max4096x4_unaligned_load_regular_store_avx512 (destination=, source=, size=) at ../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_avx512.c:3929 #1 I_MPI_memcpy_multipage_avx512 () at ../../src/mpid/ch4/shm/posix/eager/include/i_mpi_memcpy_avx512.c:5716 #2 0x00007f3c47ac02e8 in bcast_huge_send_memcpy (d=, s=, n=, accumulated_size=, total_size=) at ../../src/mpid/ch4/shm/posix/eager/include/intel_transport_bcast.h:173 #3 impi_bcast_intra_huge (transport=0x7f3b80e0d000, transfer_buffer=0x7f200043d380, transfer_size=0, local_root=4096, node_comm=0x7f200043d380) at ../../src/mpid/ch4/shm/posix/eager/include/intel_transport_bcast.h:557 #4 0x00007f3c47abd8fb in impi_bcast_intra_heap (mpir_comm=0x7f3b80e0d000, root=4445056, user_buffer=0x0, user_count=4096, user_datatype=4445056, cnt=0x7f3b80e0d000, errflag=0x7fffe040e9c8) at ../../src/mpid/ch4/shm/posix/eager/include/intel_transport_bcast.h:798 #5 0x00007f3c47307f6d in MPIDI_POSIX_mpi_bcast (buffer=0x7f3b80e0d000, count=139775420126080, datatype=0, root=4096, comm_ptr=0x7f200043d380, errflag=0x7f3b80e0d000, ch4_algo_parameters_container_in=0x8000) at ../../src/mpid/ch4/shm/src/../src/../posix/intel/posix_coll.h:124 #6 0x00007f3c472e782b in MPIDI_SHM_mpi_bcast (buffer=, count=, datatype=, root=, comm=, errflag=, algo_parameters_container=0x4000) at ../../src/mpid/ch4/shm/src/../src/shm_coll.h:39 #7 MPIDI_Bcast_intra_composition_delta (buffer=, count=, datatype=, root=, comm_ptr=, errflag=, ch4_algo_parameters_container=) at ../../src/mpid/ch4/src/intel/ch4_coll_impl.h:440 #8 MPID_Bcast_invoke (buffer=, count=32768, datatype=, root=0, comm=, errflag=0x7fffe040e9c8, ch4_algo_parameters_container=) at ../../src/mpid/ch4/src/intel/ch4_coll_select_utils.c:1745 #9 MPIDI_coll_invoke (coll_sig=0x7f3b80e0d000, container=0x7f200043d380, req=0x0) at ../../src/mpid/ch4/src/intel/ch4_coll_select_utils.c:3356 #10 0x00007f3c472ccbee in MPIDI_coll_select (coll_sig=0x7f3b80e0d000, req=0x7f200043d380) at ../../src/mpid/ch4/src/intel/ch4_coll_globals_default.c:129 #11 0x00007f3c4739502d in MPID_Bcast (buffer=, count=, datatype=, root=, comm=, errflag=) at ../../src/mpid/ch4/src/intel/ch4_coll.h:51 #12 MPIR_Bcast (buffer=0x7f3b80e0d000, count=4445056, datatype=0, root=4096, comm_ptr=0x7f200043d380, errflag=0x7f3b80e0d000) at ../../src/mpi/coll/intel/coll_impl.c:374 #13 0x00007f3c472b16e9 in PMPI_Bcast (buffer=0x7f3b80e0d000, count=4445056, datatype=0, root=4096, comm=4445056) at ../../src/mpi/coll/bcast/bcast.c:416 #14 0x000000000044f099 in IMB_bcast () #15 0x000000000042fe83 in Bmark_descr::IMB_init_buffers_iter(comm_info*, iter_schedule*, Bench*, cmode*, int, int) () #16 0x0000000000447405 in OriginalBenchmark, &IMB_bcast>::run(scope_item const&) () #17 0x0000000000405de9 in main () (gdb)