Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

why defferent E5 v3 and E5 v2 in linpack

gridpc_g_
Beginner
778 Views

When I use intel mkl , intel mpi ,intel compiler to test linpack l in e5 v3 2660, I get RMAX/RPEAK IS 11% .BUT in e5 v2 is 93%. Why?

0 Kudos
9 Replies
gridpc_g_
Beginner
778 Views
Hpl.dat is same in v3 and v2 cpu .please give me some advice.Thanks.
0 Kudos
Steve_H_Intel1
Employee
778 Views

 

Transferred to Intel® Math Kernel Library Forum.

0 Kudos
Ying_H_Intel
Employee
778 Views

Hi gridpc g

From http://ark.intel.com/compare/75272,81706. The two CPU feature are look like below 

 

Intel® Xeon® Processor E5-2660 v2 (25M Cache, 2.20 GHz) Intel® Xeon® Processor E5-2660 v3 (25M Cache, 2.60 GHz)
Code Name
Ivy Bridge EP Haswell
-
Essentials
Status
Launched Launched
Launch Date
Q3'13 Q3'14
Processor Number
E5-2660V2 E5-2660V3
Instruction Set Extensions
AVX AVX 2.0

 

Could you please provide the details about HPL.dat  and which binary are you running? 

Best Regards,

Ying 

0 Kudos
VipinKumar_E_Intel
778 Views

Hi,

  Are you using the Intel optimized linkpack binary from https://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download?

Or, are you building the HPL yourself?  If yes, what are the compiler/link options? Also, provide platform details for both.

--Vipin

 

0 Kudos
Murat_G_Intel
Employee
778 Views

Yes, providing the details above will help. The recommended HPL binary is the offload version: mp_linpack/bin_intel/intel64/xhpl_offload_intel64 . The NB value can be set to 192 for v3 systems. You can first run this binary with 1 MPI rank per node. Later you may try to run it with 1 MPI per socket to get the best performance using the scripts provided: mp_linpack/bin_intel/intel64/runme_offload_intel64.

Thank you.

0 Kudos
Reza_M_1
Beginner
778 Views

Hello Murat,

I have same problem more or less, I have two set of computing nodes (16 of 2690 V2 procesor type nodes & 64 of 2690 V3 processor type nodes) which tried to get Linpack benchmarks . First I tried Intel Parallel Studio with 16* V2 computing nodes and I got about 92% performance efficiency which is great.

in second step I tried to run same Linpack over 16 * V3 type processors but I just got 74% efficency . the configuration is same, even I reinstalled Paralle studio on V3 processor type computing nodes but still have same problem.

By the way, running Linpack over one single node gives 87% performance but over 16 or 64 will drop to 74%.

 

Please kindly support me to solve the problem.

 

Best Regards,

Reza Mirani

mirani@hpcmicrosystems.net 

0 Kudos
VipinKumar_E_Intel
778 Views

Hi gridpc and Reza,

   As Efe recommended, could you the offload binary that I mentioned above, even if there is no Xeon Phi in your systems. We will be avoiding this confusion soon, having just one single binary out there.

Vipin

0 Kudos
Reza_M_1
Beginner
778 Views

Hi Vipin,

I tried make arch=intel64 version=offload   but the was an error  which means couldn't find offload library. I  used offload pre-build binary in bin_intel64 folder but the results was very poor and I couldn't understand the results structure. is it per each node or combination of ?

on the other hand, cpu utilization was not equal in this test, some core had 200% and some of them was free.

Reza 

0 Kudos
Kazushige_G_Intel
778 Views

  HPL performance is limited by the slowest node. Please check if all nodes perform 87% equally. Also performance tends to be low if problem size is too small due to communication overhead. Please try to increase problem size.

Thanks,

 Kazushige Goto

 

0 Kudos
Reply