Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

libfabric 1.13 and Intel True Scale Fabric

Jonas_D_
Beginner
2,533 Views

We have an issue with Intel MPI and Intel True Scale Fabric. We have a linux Centos 7.9 Cluster with Intel True Scale Fabric Edge Managed Switch QDR InfiniBand so our infiniband devices are qib0

$ ibstat
CA 'qib0'
CA type: InfiniPath_QLE7340
Number of ports: 1
Firmware version:
Hardware version: 2
Node GUID: 0x00117500006f7990
System image GUID: 0x00117500006f7990


We installed Intel OneApi Toolkits 2021 with libfabric 1.13 but we are not getting the expected scalability. The MPI inter-node communication is very slow.


When we run fi_info --list we get:

$ fi_info --list
psm2:
version: 113.0
psm3:
version: 1101.0
ofi_rxm:
version: 113.0
verbs:
version: 113.0
tcp:
version: 113.0
sockets:
version: 113.0
shm:
version: 113.0
ofi_hook_noop:
version: 113.0


I know that when I'm running MPI programs on a cluster with Intel True Scale HCAs is recommended the use PSM interface but the MPI startup program is taking tcp;ofi provider by default, and the running MPI program is very slow!

] MPI startup(): libfabric version: 1.13.0-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): detected tcp;ofi_rxm provider, set device name to "tcp-ofi-rxm"
[0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 64, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1
[0] MPI startup(): addrnamelen: 16
[0] MPI startup(): File "/NFS/opt/intel/oneapi/mpi/2021.4.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm_40.dat" not found
[0] MPI startup(): Load tuning file: "/NFS/opt/intel/oneapi/mpi/2021.4.0/etc/tuning_skx_shm-ofi_tcp-ofi-rxm.dat"
:
:


Any suggestions to improve MPI inter-node communication?

Any help would be appreciated, I don't have enough experience with libfabric.


Thanks.

0 Kudos
16 Replies
SantoshY_Intel
Moderator
2,506 Views

Hi,

 

Thank you for posting in the Intel forums.

 

>>"the MPI startup program is taking tcp;ofi provider by default"

We can choose a specific OFI provider during runtime. To select the libfabric provider from the libfabric library, use the FI_PROVIDER environment variable, which defines the name of the OFI provider to load:

export FI_PROVIDER=<name>

Where <name> is the OFI provider to load(psm3/verbs etc..).

 

We recommend you use the below command to launch the MPI application:

export FI_PROVIDER=<name>
export I_MPI_DEBUG=10
mpirun -n <total-no-of processes> -ppn <processes-per-node> -f nodefile ./executable

 

Thanks & Regards,

Santosh

0 Kudos
Jonas_D_
Beginner
2,438 Views

Thanks for the tip. We have already tried all the available fabrics and get low performance.

0 Kudos
SantoshY_Intel
Moderator
2,445 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks & Regards,

Santosh


0 Kudos
Jonas_D_
Beginner
2,438 Views

I still have a performance issue. For the Infiniband switch that I have, I need to use PSM1 fabric, but the new Intel compilers don't appear to be compatible with this fabric (despite the fact that the switch is also from Intel!).

Intel's customer support didn't offer an acceptable solution. All the suggestions resulted in very poor performance. I think it is a backward compatibility issue from Intel, I don't think I can solve this from my side.

0 Kudos
SantoshY_Intel
Moderator
2,402 Views

Hi,


Could you please provide us with the following details to investigate your issue?

  1. Sample reproducer code and the steps you are using to launch your application on a cluster.
  2. What is the job scheduler you are using?
  3. We need the complete debug log from your end. Example: I_MPI_DEBUG=30 mpirun -n 9 -ppn 3 -f nodefile ./executable  
  4. Are there any performance numbers? How you are measuring the performance here?


Thanks & Regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
2,381 Views

Hi,


Could you please provide us with the above-requested details to investigate your issue?


Thanks & Regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
2,344 Views

Hi,


Thanks for providing all the requested details internally.


We are working on your issue & we will get back to you soon.


Thanks & regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
2,320 Views

Hi,

 

Could you please try running the below code on your cluster(single node & multinode)? Also, please share the complete debug & output log from single & multinode runs of the sample code. Use I_MPI_DEBUG=10 to get the MPI debug information.

 

Sample code:

 

#include "mpi.h"
#include <iostream>
#include <chrono>
#include <typeinfo>

using namespace std;
int main(int argc, char *argv[])
{
  int i, rank, size, namelen;
  char name[MPI_MAX_PROCESSOR_NAME];
  int sum=0;
  int res;

  MPI::Status stat;
  MPI::Init(argc, argv);
  auto start = chrono::steady_clock::now();
  size = MPI::COMM_WORLD.Get_size();
  rank = MPI::COMM_WORLD.Get_rank();
  MPI::Get_processor_name(name, namelen);

  if (rank == 0) {
  std::cout << "Hello world: rank " << rank << " of " << size << " running 
  on " << name <<   "\n";

  } else {
   std::cout << "Hello world: rank " << rank << " of " << size << " running 
   on " << name << "\n";
   }
  auto end = chrono::steady_clock::now();
  res = chrono::duration_cast<chrono::microseconds>(end - start).count();
   cout << "Rank ["<< rank << "] elapsed time in microseconds: " << res << " µs" << endl;
  MPI_Reduce(&res, &sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
    if( rank == 0){
        cout<< " total time taken : "<<sum<<endl;
        cout<< " Avg time taken : " << (float)sum/8 << endl;
    }
   MPI::Finalize();
  return (0);
}

 

 

Thanks & Regards,

Santosh

 

0 Kudos
SantoshY_Intel
Moderator
2,259 Views

Hi,

 

Thanks for providing the complete debug logs.

Do you have exclusive access to the nodes? If yes, could you please try running your applications on the exclusive nodes? Because, performance becomes unpredictable on the nodes which are shared to multiple users to perform computations.

 

Thanks & Regards,

Santosh

 

0 Kudos
Jonas_D_
Beginner
2,252 Views

Hello,

we have exclusive access. All the tests above were performed with exclusive access.

Thanks

Jonas

0 Kudos
SantoshY_Intel
Moderator
2,244 Views

Hi,

 

Could you please provide us with the below details which would help us in further investigation of your issue?

  1. Is hyperthreading enabled in your systems?
  2. Could you please provide the results of "lscpu" command?
  3. Are all the nodes configured in the same way?
  4. As you provided us with multiple runs of the sample on 2 nodes, similarly could you provide us the multiple runs of the sample on a single node?

 

Thanks & Regards,

Santosh

 

0 Kudos
Jonas_D_
Beginner
2,205 Views

Is hyperthreading enabled in your systems? No

Are all the nodes configured in the same way? Yes

Could you please provide the results of "lscpu" command? This is he output for every node:

n6: Architecture:          x86_64
n6: CPU op-mode(s):        32-bit, 64-bit
n6: Byte Order:            Little Endian
n6: CPU(s):                20
n6: On-line CPU(s) list:   0-19
n6: Thread(s) per core:    1
n6: Core(s) per socket:    10
n6: Socket(s):             2
n6: NUMA node(s):          2
n6: Vendor ID:             GenuineIntel
n6: CPU family:            6
n6: Model:                 62
n6: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n6: Stepping:              4
n6: CPU MHz:               3296.966
n6: CPU max MHz:           3300.0000
n6: CPU min MHz:           1200.0000
n6: BogoMIPS:              5000.07
n6: Virtualization:        VT-x
n6: L1d cache:             32K
n6: L1i cache:             32K
n6: L2 cache:              256K
n6: L3 cache:              25600K
n6: NUMA node0 CPU(s):     0-9
n6: NUMA node1 CPU(s):     10-19
n6: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n5: Architecture:          x86_64
n5: CPU op-mode(s):        32-bit, 64-bit
n5: Byte Order:            Little Endian
n5: CPU(s):                20
n5: On-line CPU(s) list:   0-19
n5: Thread(s) per core:    1
n5: Core(s) per socket:    10
n5: Socket(s):             2
n5: NUMA node(s):          2
n5: Vendor ID:             GenuineIntel
n5: CPU family:            6
n5: Model:                 62
n5: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n5: Stepping:              4
n5: CPU MHz:               1199.951
n5: CPU max MHz:           3300.0000
n5: CPU min MHz:           1200.0000
n5: BogoMIPS:              4999.80
n5: Virtualization:        VT-x
n5: L1d cache:             32K
n5: L1i cache:             32K
n5: L2 cache:              256K
n5: L3 cache:              25600K
n5: NUMA node0 CPU(s):     0-9
n5: NUMA node1 CPU(s):     10-19
n5: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n7: Architecture:          x86_64
n7: CPU op-mode(s):        32-bit, 64-bit
n7: Byte Order:            Little Endian
n7: CPU(s):                20
n7: On-line CPU(s) list:   0-19
n7: Thread(s) per core:    1
n7: Core(s) per socket:    10
n7: Socket(s):             2
n7: NUMA node(s):          2
n7: Vendor ID:             GenuineIntel
n7: CPU family:            6
n7: Model:                 62
n7: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n7: Stepping:              4
n7: CPU MHz:               1443.634
n7: CPU max MHz:           3300.0000
n7: CPU min MHz:           1200.0000
n7: BogoMIPS:              5000.12
n7: Virtualization:        VT-x
n7: L1d cache:             32K
n7: L1i cache:             32K
n7: L2 cache:              256K
n7: L3 cache:              25600K
n7: NUMA node0 CPU(s):     0-9
n7: NUMA node1 CPU(s):     10-19
n7: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n1: Architecture:          x86_64
n1: CPU op-mode(s):        32-bit, 64-bit
n1: Byte Order:            Little Endian
n1: CPU(s):                20
n1: On-line CPU(s) list:   0-19
n1: Thread(s) per core:    1
n1: Core(s) per socket:    10
n1: Socket(s):             2
n1: NUMA node(s):          2
n1: Vendor ID:             GenuineIntel
n1: CPU family:            6
n1: Model:                 62
n1: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n1: Stepping:              4
n1: CPU MHz:               1385.650
n1: CPU max MHz:           3300.0000
n1: CPU min MHz:           1200.0000
n1: BogoMIPS:              4999.82
n1: Virtualization:        VT-x
n1: L1d cache:             32K
n1: L1i cache:             32K
n1: L2 cache:              256K
n1: L3 cache:              25600K
n1: NUMA node0 CPU(s):     0-9
n1: NUMA node1 CPU(s):     10-19
n1: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n9: Architecture:          x86_64
n9: CPU op-mode(s):        32-bit, 64-bit
n9: Byte Order:            Little Endian
n9: CPU(s):                20
n9: On-line CPU(s) list:   0-19
n9: Thread(s) per core:    1
n9: Core(s) per socket:    10
n9: Socket(s):             2
n9: NUMA node(s):          2
n9: Vendor ID:             GenuineIntel
n9: CPU family:            6
n9: Model:                 62
n9: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n9: Stepping:              4
n9: CPU MHz:               1199.951
n9: CPU max MHz:           3300.0000
n9: CPU min MHz:           1200.0000
n9: BogoMIPS:              4999.83
n9: Virtualization:        VT-x
n9: L1d cache:             32K
n9: L1i cache:             32K
n9: L2 cache:              256K
n9: L3 cache:              25600K
n9: NUMA node0 CPU(s):     0-9
n9: NUMA node1 CPU(s):     10-19
n9: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n2: Architecture:          x86_64
n2: CPU op-mode(s):        32-bit, 64-bit
n2: Byte Order:            Little Endian
n2: CPU(s):                20
n2: On-line CPU(s) list:   0-19
n2: Thread(s) per core:    1
n2: Core(s) per socket:    10
n2: Socket(s):             2
n2: NUMA node(s):          2
n2: Vendor ID:             GenuineIntel
n2: CPU family:            6
n2: Model:                 62
n2: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n2: Stepping:              4
n2: CPU MHz:               1200.256
n2: CPU max MHz:           3300.0000
n2: CPU min MHz:           1200.0000
n2: BogoMIPS:              5000.24
n2: Virtualization:        VT-x
n2: L1d cache:             32K
n2: L1i cache:             32K
n2: L2 cache:              256K
n2: L3 cache:              25600K
n2: NUMA node0 CPU(s):     0-9
n2: NUMA node1 CPU(s):     10-19
n2: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n8: Architecture:          x86_64
n8: CPU op-mode(s):        32-bit, 64-bit
n8: Byte Order:            Little Endian
n8: CPU(s):                20
n8: On-line CPU(s) list:   0-19
n8: Thread(s) per core:    1
n8: Core(s) per socket:    10
n8: Socket(s):             2
n8: NUMA node(s):          2
n8: Vendor ID:             GenuineIntel
n8: CPU family:            6
n8: Model:                 62
n8: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n8: Stepping:              4
n8: CPU MHz:               1199.951
n8: CPU max MHz:           3300.0000
n8: CPU min MHz:           1200.0000
n8: BogoMIPS:              5000.12
n8: Virtualization:        VT-x
n8: L1d cache:             32K
n8: L1i cache:             32K
n8: L2 cache:              256K
n8: L3 cache:              25600K
n8: NUMA node0 CPU(s):     0-9
n8: NUMA node1 CPU(s):     10-19
n8: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n3: Architecture:          x86_64
n3: CPU op-mode(s):        32-bit, 64-bit
n3: Byte Order:            Little Endian
n3: CPU(s):                20
n3: On-line CPU(s) list:   0-19
n3: Thread(s) per core:    1
n3: Core(s) per socket:    10
n3: Socket(s):             2
n3: NUMA node(s):          2
n3: Vendor ID:             GenuineIntel
n3: CPU family:            6
n3: Model:                 62
n3: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n3: Stepping:              4
n3: CPU MHz:               1222.076
n3: CPU max MHz:           3300.0000
n3: CPU min MHz:           1200.0000
n3: BogoMIPS:              5000.09
n3: Virtualization:        VT-x
n3: L1d cache:             32K
n3: L1i cache:             32K
n3: L2 cache:              256K
n3: L3 cache:              25600K
n3: NUMA node0 CPU(s):     0-9
n3: NUMA node1 CPU(s):     10-19
n3: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n10: Architecture:          x86_64
n10: CPU op-mode(s):        32-bit, 64-bit
n10: Byte Order:            Little Endian
n10: CPU(s):                20
n10: On-line CPU(s) list:   0-19
n10: Thread(s) per core:    1
n10: Core(s) per socket:    10
n10: Socket(s):             2
n10: NUMA node(s):          2
n10: Vendor ID:             GenuineIntel
n10: CPU family:            6
n10: Model:                 62
n10: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n10: Stepping:              4
n10: CPU MHz:               1199.951
n10: CPU max MHz:           3300.0000
n10: CPU min MHz:           1200.0000
n10: BogoMIPS:              5000.01
n10: Virtualization:        VT-x
n10: L1d cache:             32K
n10: L1i cache:             32K
n10: L2 cache:              256K
n10: L3 cache:              25600K
n10: NUMA node0 CPU(s):     0-9
n10: NUMA node1 CPU(s):     10-19
n10: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n23: Architecture:          x86_64
n23: CPU op-mode(s):        32-bit, 64-bit
n23: Byte Order:            Little Endian
n23: CPU(s):                20
n23: On-line CPU(s) list:   0-19
n23: Thread(s) per core:    1
n23: Core(s) per socket:    10
n23: Socket(s):             2
n23: NUMA node(s):          2
n23: Vendor ID:             GenuineIntel
n23: CPU family:            6
n23: Model:                 62
n23: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n23: Stepping:              4
n23: CPU MHz:               1199.951
n23: CPU max MHz:           3300.0000
n23: CPU min MHz:           1200.0000
n23: BogoMIPS:              4999.96
n23: Virtualization:        VT-x
n23: L1d cache:             32K
n23: L1i cache:             32K
n23: L2 cache:              256K
n23: L3 cache:              25600K
n23: NUMA node0 CPU(s):     0-9
n23: NUMA node1 CPU(s):     10-19
n23: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n21: Architecture:          x86_64
n21: CPU op-mode(s):        32-bit, 64-bit
n21: Byte Order:            Little Endian
n21: CPU(s):                20
n21: On-line CPU(s) list:   0-19
n21: Thread(s) per core:    1
n21: Core(s) per socket:    10
n21: Socket(s):             2
n21: NUMA node(s):          2
n21: Vendor ID:             GenuineIntel
n21: CPU family:            6
n21: Model:                 62
n21: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n21: Stepping:              4
n21: CPU MHz:               2129.516
n21: CPU max MHz:           3300.0000
n21: CPU min MHz:           1200.0000
n21: BogoMIPS:              4999.88
n21: Virtualization:        VT-x
n21: L1d cache:             32K
n21: L1i cache:             32K
n21: L2 cache:              256K
n21: L3 cache:              25600K
n21: NUMA node0 CPU(s):     0-9
n21: NUMA node1 CPU(s):     10-19
n21: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n12: Architecture:          x86_64
n12: CPU op-mode(s):        32-bit, 64-bit
n12: Byte Order:            Little Endian
n12: CPU(s):                20
n12: On-line CPU(s) list:   0-19
n12: Thread(s) per core:    1
n12: Core(s) per socket:    10
n12: Socket(s):             2
n12: NUMA node(s):          2
n12: Vendor ID:             GenuineIntel
n12: CPU family:            6
n12: Model:                 62
n12: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n12: Stepping:              4
n12: CPU MHz:               1390.228
n12: CPU max MHz:           3300.0000
n12: CPU min MHz:           1200.0000
n12: BogoMIPS:              4999.87
n12: Virtualization:        VT-x
n12: L1d cache:             32K
n12: L1i cache:             32K
n12: L2 cache:              256K
n12: L3 cache:              25600K
n12: NUMA node0 CPU(s):     0-9
n12: NUMA node1 CPU(s):     10-19
n12: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n4: Architecture:          x86_64
n4: CPU op-mode(s):        32-bit, 64-bit
n4: Byte Order:            Little Endian
n4: CPU(s):                20
n4: On-line CPU(s) list:   0-19
n4: Thread(s) per core:    1
n4: Core(s) per socket:    10
n4: Socket(s):             2
n4: NUMA node(s):          2
n4: Vendor ID:             GenuineIntel
n4: CPU family:            6
n4: Model:                 62
n4: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n4: Stepping:              4
n4: CPU MHz:               1346.130
n4: CPU max MHz:           3300.0000
n4: CPU min MHz:           1200.0000
n4: BogoMIPS:              5000.01
n4: Virtualization:        VT-x
n4: L1d cache:             32K
n4: L1i cache:             32K
n4: L2 cache:              256K
n4: L3 cache:              25600K
n4: NUMA node0 CPU(s):     0-9
n4: NUMA node1 CPU(s):     10-19
n4: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n19: Architecture:          x86_64
n19: CPU op-mode(s):        32-bit, 64-bit
n19: Byte Order:            Little Endian
n19: CPU(s):                20
n19: On-line CPU(s) list:   0-19
n19: Thread(s) per core:    1
n19: Core(s) per socket:    10
n19: Socket(s):             2
n19: NUMA node(s):          2
n19: Vendor ID:             GenuineIntel
n19: CPU family:            6
n19: Model:                 62
n19: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n19: Stepping:              4
n19: CPU MHz:               1199.951
n19: CPU max MHz:           3300.0000
n19: CPU min MHz:           1200.0000
n19: BogoMIPS:              5000.29
n19: Virtualization:        VT-x
n19: L1d cache:             32K
n19: L1i cache:             32K
n19: L2 cache:              256K
n19: L3 cache:              25600K
n19: NUMA node0 CPU(s):     0-9
n19: NUMA node1 CPU(s):     10-19
n19: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n14: Architecture:          x86_64
n14: CPU op-mode(s):        32-bit, 64-bit
n14: Byte Order:            Little Endian
n14: CPU(s):                20
n14: On-line CPU(s) list:   0-19
n14: Thread(s) per core:    1
n14: Core(s) per socket:    10
n14: Socket(s):             2
n14: NUMA node(s):          2
n14: Vendor ID:             GenuineIntel
n14: CPU family:            6
n14: Model:                 62
n14: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n14: Stepping:              4
n14: CPU MHz:               1199.951
n14: CPU max MHz:           3300.0000
n14: CPU min MHz:           1200.0000
n14: BogoMIPS:              5000.28
n14: Virtualization:        VT-x
n14: L1d cache:             32K
n14: L1i cache:             32K
n14: L2 cache:              256K
n14: L3 cache:              25600K
n14: NUMA node0 CPU(s):     0-9
n14: NUMA node1 CPU(s):     10-19
n14: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n26: Architecture:          x86_64
n26: CPU op-mode(s):        32-bit, 64-bit
n26: Byte Order:            Little Endian
n26: CPU(s):                20
n26: On-line CPU(s) list:   0-19
n26: Thread(s) per core:    1
n26: Core(s) per socket:    10
n26: Socket(s):             2
n26: NUMA node(s):          2
n26: Vendor ID:             GenuineIntel
n26: CPU family:            6
n26: Model:                 62
n26: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n26: Stepping:              4
n26: CPU MHz:               1199.951
n26: CPU max MHz:           3300.0000
n26: CPU min MHz:           1200.0000
n26: BogoMIPS:              5000.26
n26: Virtualization:        VT-x
n26: L1d cache:             32K
n26: L1i cache:             32K
n26: L2 cache:              256K
n26: L3 cache:              25600K
n26: NUMA node0 CPU(s):     0-9
n26: NUMA node1 CPU(s):     10-19
n26: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n24: Architecture:          x86_64
n24: CPU op-mode(s):        32-bit, 64-bit
n24: Byte Order:            Little Endian
n24: CPU(s):                20
n24: On-line CPU(s) list:   0-19
n24: Thread(s) per core:    1
n24: Core(s) per socket:    10
n24: Socket(s):             2
n24: NUMA node(s):          2
n24: Vendor ID:             GenuineIntel
n24: CPU family:            6
n24: Model:                 62
n24: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n24: Stepping:              4
n24: CPU MHz:               1199.951
n24: CPU max MHz:           3300.0000
n24: CPU min MHz:           1200.0000
n24: BogoMIPS:              5000.11
n24: Virtualization:        VT-x
n24: L1d cache:             32K
n24: L1i cache:             32K
n24: L2 cache:              256K
n24: L3 cache:              25600K
n24: NUMA node0 CPU(s):     0-9
n24: NUMA node1 CPU(s):     10-19
n24: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n20: Architecture:          x86_64
n20: CPU op-mode(s):        32-bit, 64-bit
n20: Byte Order:            Little Endian
n20: CPU(s):                20
n20: On-line CPU(s) list:   0-19
n20: Thread(s) per core:    1
n20: Core(s) per socket:    10
n20: Socket(s):             2
n20: NUMA node(s):          2
n20: Vendor ID:             GenuineIntel
n20: CPU family:            6
n20: Model:                 62
n20: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n20: Stepping:              4
n20: CPU MHz:               1200.866
n20: CPU max MHz:           3300.0000
n20: CPU min MHz:           1200.0000
n20: BogoMIPS:              5000.23
n20: Virtualization:        VT-x
n20: L1d cache:             32K
n20: L1i cache:             32K
n20: L2 cache:              256K
n20: L3 cache:              25600K
n20: NUMA node0 CPU(s):     0-9
n20: NUMA node1 CPU(s):     10-19
n20: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n22: Architecture:          x86_64
n22: CPU op-mode(s):        32-bit, 64-bit
n22: Byte Order:            Little Endian
n22: CPU(s):                20
n22: On-line CPU(s) list:   0-19
n22: Thread(s) per core:    1
n22: Core(s) per socket:    10
n22: Socket(s):             2
n22: NUMA node(s):          2
n22: Vendor ID:             GenuineIntel
n22: CPU family:            6
n22: Model:                 62
n22: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n22: Stepping:              4
n22: CPU MHz:               1199.951
n22: CPU max MHz:           3300.0000
n22: CPU min MHz:           1200.0000
n22: BogoMIPS:              4999.99
n22: Virtualization:        VT-x
n22: L1d cache:             32K
n22: L1i cache:             32K
n22: L2 cache:              256K
n22: L3 cache:              25600K
n22: NUMA node0 CPU(s):     0-9
n22: NUMA node1 CPU(s):     10-19
n22: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n16: Architecture:          x86_64
n16: CPU op-mode(s):        32-bit, 64-bit
n16: Byte Order:            Little Endian
n16: CPU(s):                20
n16: On-line CPU(s) list:   0-19
n16: Thread(s) per core:    1
n16: Core(s) per socket:    10
n16: Socket(s):             2
n16: NUMA node(s):          2
n16: Vendor ID:             GenuineIntel
n16: CPU family:            6
n16: Model:                 62
n16: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n16: Stepping:              4
n16: CPU MHz:               1201.324
n16: CPU max MHz:           3300.0000
n16: CPU min MHz:           1200.0000
n16: BogoMIPS:              5000.26
n16: Virtualization:        VT-x
n16: L1d cache:             32K
n16: L1i cache:             32K
n16: L2 cache:              256K
n16: L3 cache:              25600K
n16: NUMA node0 CPU(s):     0-9
n16: NUMA node1 CPU(s):     10-19
n16: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n15: Architecture:          x86_64
n15: CPU op-mode(s):        32-bit, 64-bit
n15: Byte Order:            Little Endian
n15: CPU(s):                20
n15: On-line CPU(s) list:   0-19
n15: Thread(s) per core:    1
n15: Core(s) per socket:    10
n15: Socket(s):             2
n15: NUMA node(s):          2
n15: Vendor ID:             GenuineIntel
n15: CPU family:            6
n15: Model:                 62
n15: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n15: Stepping:              4
n15: CPU MHz:               1199.951
n15: CPU max MHz:           3300.0000
n15: CPU min MHz:           1200.0000
n15: BogoMIPS:              5000.11
n15: Virtualization:        VT-x
n15: L1d cache:             32K
n15: L1i cache:             32K
n15: L2 cache:              256K
n15: L3 cache:              25600K
n15: NUMA node0 CPU(s):     0-9
n15: NUMA node1 CPU(s):     10-19
n15: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n11: Architecture:          x86_64
n11: CPU op-mode(s):        32-bit, 64-bit
n11: Byte Order:            Little Endian
n11: CPU(s):                20
n11: On-line CPU(s) list:   0-19
n11: Thread(s) per core:    1
n11: Core(s) per socket:    10
n11: Socket(s):             2
n11: NUMA node(s):          2
n11: Vendor ID:             GenuineIntel
n11: CPU family:            6
n11: Model:                 62
n11: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n11: Stepping:              4
n11: CPU MHz:               1199.951
n11: CPU max MHz:           3300.0000
n11: CPU min MHz:           1200.0000
n11: BogoMIPS:              4999.99
n11: Virtualization:        VT-x
n11: L1d cache:             32K
n11: L1i cache:             32K
n11: L2 cache:              256K
n11: L3 cache:              25600K
n11: NUMA node0 CPU(s):     0-9
n11: NUMA node1 CPU(s):     10-19
n11: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n17: Architecture:          x86_64
n17: CPU op-mode(s):        32-bit, 64-bit
n17: Byte Order:            Little Endian
n17: CPU(s):                20
n17: On-line CPU(s) list:   0-19
n17: Thread(s) per core:    1
n17: Core(s) per socket:    10
n17: Socket(s):             2
n17: NUMA node(s):          2
n17: Vendor ID:             GenuineIntel
n17: CPU family:            6
n17: Model:                 62
n17: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n17: Stepping:              4
n17: CPU MHz:               1301.879
n17: CPU max MHz:           3300.0000
n17: CPU min MHz:           1200.0000
n17: BogoMIPS:              4999.96
n17: Virtualization:        VT-x
n17: L1d cache:             32K
n17: L1i cache:             32K
n17: L2 cache:              256K
n17: L3 cache:              25600K
n17: NUMA node0 CPU(s):     0-9
n17: NUMA node1 CPU(s):     10-19
n17: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n28: Architecture:          x86_64
n28: CPU op-mode(s):        32-bit, 64-bit
n28: Byte Order:            Little Endian
n28: CPU(s):                20
n28: On-line CPU(s) list:   0-19
n28: Thread(s) per core:    1
n28: Core(s) per socket:    10
n28: Socket(s):             2
n28: NUMA node(s):          2
n28: Vendor ID:             GenuineIntel
n28: CPU family:            6
n28: Model:                 62
n28: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n28: Stepping:              4
n28: CPU MHz:               1199.951
n28: CPU max MHz:           3300.0000
n28: CPU min MHz:           1200.0000
n28: BogoMIPS:              5000.30
n28: Virtualization:        VT-x
n28: L1d cache:             32K
n28: L1i cache:             32K
n28: L2 cache:              256K
n28: L3 cache:              25600K
n28: NUMA node0 CPU(s):     0-9
n28: NUMA node1 CPU(s):     10-19
n28: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n27: Architecture:          x86_64
n27: CPU op-mode(s):        32-bit, 64-bit
n27: Byte Order:            Little Endian
n27: CPU(s):                20
n27: On-line CPU(s) list:   0-19
n27: Thread(s) per core:    1
n27: Core(s) per socket:    10
n27: Socket(s):             2
n27: NUMA node(s):          2
n27: Vendor ID:             GenuineIntel
n27: CPU family:            6
n27: Model:                 62
n27: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n27: Stepping:              4
n27: CPU MHz:               1199.951
n27: CPU max MHz:           3300.0000
n27: CPU min MHz:           1200.0000
n27: BogoMIPS:              4999.88
n27: Virtualization:        VT-x
n27: L1d cache:             32K
n27: L1i cache:             32K
n27: L2 cache:              256K
n27: L3 cache:              25600K
n27: NUMA node0 CPU(s):     0-9
n27: NUMA node1 CPU(s):     10-19
n27: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n18: Architecture:          x86_64
n18: CPU op-mode(s):        32-bit, 64-bit
n18: Byte Order:            Little Endian
n18: CPU(s):                20
n18: On-line CPU(s) list:   0-19
n18: Thread(s) per core:    1
n18: Core(s) per socket:    10
n18: Socket(s):             2
n18: NUMA node(s):          2
n18: Vendor ID:             GenuineIntel
n18: CPU family:            6
n18: Model:                 62
n18: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n18: Stepping:              4
n18: CPU MHz:               1199.951
n18: CPU max MHz:           3300.0000
n18: CPU min MHz:           1200.0000
n18: BogoMIPS:              5000.02
n18: Virtualization:        VT-x
n18: L1d cache:             32K
n18: L1i cache:             32K
n18: L2 cache:              256K
n18: L3 cache:              25600K
n18: NUMA node0 CPU(s):     0-9
n18: NUMA node1 CPU(s):     10-19
n18: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n13: Architecture:          x86_64
n13: CPU op-mode(s):        32-bit, 64-bit
n13: Byte Order:            Little Endian
n13: CPU(s):                20
n13: On-line CPU(s) list:   0-19
n13: Thread(s) per core:    1
n13: Core(s) per socket:    10
n13: Socket(s):             2
n13: NUMA node(s):          2
n13: Vendor ID:             GenuineIntel
n13: CPU family:            6
n13: Model:                 62
n13: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n13: Stepping:              4
n13: CPU MHz:               1199.951
n13: CPU max MHz:           3300.0000
n13: CPU min MHz:           1200.0000
n13: BogoMIPS:              4999.80
n13: Virtualization:        VT-x
n13: L1d cache:             32K
n13: L1i cache:             32K
n13: L2 cache:              256K
n13: L3 cache:              25600K
n13: NUMA node0 CPU(s):     0-9
n13: NUMA node1 CPU(s):     10-19
n13: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n25: Architecture:          x86_64
n25: CPU op-mode(s):        32-bit, 64-bit
n25: Byte Order:            Little Endian
n25: CPU(s):                20
n25: On-line CPU(s) list:   0-19
n25: Thread(s) per core:    1
n25: Core(s) per socket:    10
n25: Socket(s):             2
n25: NUMA node(s):          2
n25: Vendor ID:             GenuineIntel
n25: CPU family:            6
n25: Model:                 62
n25: Model name:            Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
n25: Stepping:              4
n25: CPU MHz:               1199.951
n25: CPU max MHz:           3300.0000
n25: CPU min MHz:           1200.0000
n25: BogoMIPS:              4999.67
n25: Virtualization:        VT-x
n25: L1d cache:             32K
n25: L1i cache:             32K
n25: L2 cache:              256K
n25: L3 cache:              25600K
n25: NUMA node0 CPU(s):     0-9
n25: NUMA node1 CPU(s):     10-19
n25: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
n30: Architecture:          x86_64
n30: CPU op-mode(s):        32-bit, 64-bit
n30: Byte Order:            Little Endian
n30: CPU(s):                32
n30: On-line CPU(s) list:   0-31
n30: Thread(s) per core:    2
n30: Core(s) per socket:    8
n30: Socket(s):             2
n30: NUMA node(s):          4
n30: Vendor ID:             AuthenticAMD
n30: CPU family:            21
n30: Model:                 2
n30: Model name:            AMD Opteron(tm) Processor 6378
n30: Stepping:              0
n30: CPU MHz:               1400.000
n30: CPU max MHz:           2400.0000
n30: CPU min MHz:           1400.0000
n30: BogoMIPS:              4799.78
n30: Virtualization:        AMD-V
n30: L1d cache:             16K
n30: L1i cache:             64K
n30: L2 cache:              2048K
n30: L3 cache:              6144K
n30: NUMA node0 CPU(s):     0-7
n30: NUMA node1 CPU(s):     8-15
n30: NUMA node2 CPU(s):     16-23
n30: NUMA node3 CPU(s):     24-31
n30: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate retpoline_amd ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
n31: Architecture:          x86_64
n31: CPU op-mode(s):        32-bit, 64-bit
n31: Byte Order:            Little Endian
n31: CPU(s):                32
n31: On-line CPU(s) list:   0-31
n31: Thread(s) per core:    2
n31: Core(s) per socket:    8
n31: Socket(s):             2
n31: NUMA node(s):          4
n31: Vendor ID:             AuthenticAMD
n31: CPU family:            21
n31: Model:                 2
n31: Model name:            AMD Opteron(tm) Processor 6378
n31: Stepping:              0
n31: CPU MHz:               1400.000
n31: CPU max MHz:           2400.0000
n31: CPU min MHz:           1400.0000
n31: BogoMIPS:              4799.74
n31: Virtualization:        AMD-V
n31: L1d cache:             16K
n31: L1i cache:             64K
n31: L2 cache:              2048K
n31: L3 cache:              6144K
n31: NUMA node0 CPU(s):     0-7
n31: NUMA node1 CPU(s):     8-15
n31: NUMA node2 CPU(s):     16-23
n31: NUMA node3 CPU(s):     24-31
n31: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate retpoline_amd ssbd ibpb vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
n29: Architecture:          x86_64
n29: CPU op-mode(s):        32-bit, 64-bit
n29: Byte Order:            Little Endian
n29: CPU(s):                32
n29: On-line CPU(s) list:   0-31
n29: Thread(s) per core:    2
n29: Core(s) per socket:    8
n29: Socket(s):             2
n29: NUMA node(s):          4
n29: Vendor ID:             AuthenticAMD
n29: CPU family:            21
n29: Model:                 2
n29: Model name:            AMD Opteron(tm) Processor 6378
n29: Stepping:              0
n29: CPU MHz:               1400.000
n29: CPU max MHz:           2400.0000
n29: CPU min MHz:           1400.0000
n29: BogoMIPS:              4799.87
n29: Virtualization:        AMD-V
n29: L1d cache:             16K
n29: L1i cache:             64K
n29: L2 cache:              2048K
n29: L3 cache:              6144K
n29: NUMA node0 CPU(s):     0-7
n29: NUMA node1 CPU(s):     8-15
n29: NUMA node2 CPU(s):     16-23
n29: NUMA node3 CPU(s):     24-31
n29: Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate retpoline_amd ssbd ibpb vmmcall

0 Kudos
Jonas_D_
Beginner
2,214 Views

There are the results for multiple runs on a single node (attached)

0 Kudos
SantoshY_Intel
Moderator
2,017 Views

Hi,


Thanks for providing all the requested details. However, the sample code I provided is not enough to showcase the performance through scaling. We could get approximately the same performance on both single & multi-node runs from your end.


So, Could you please run the IMB-MPI1 benchmark on your cluster and provide us with your performance numbers on both single-node & multi-node?

We could see from the lscpu output, that you were using AMD Opteron(tm) Processor 6378.

Could you please try running it on only Intel processors & provide us with the performance numbers?


Also, is it possible for you to provide us with a sample reproducer code to reproduce your issue from our end?


Thanks & Regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
1,993 Views

Hi,


We haven't heard back from you. Could you please provide us with the above-requested details?


Thanks & Regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
1,927 Views

Hi,


I have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks & Regards,

Santosh



0 Kudos
Reply