why defferent E5 v3 and E5 v2 in linpack

gridpc_g_ · ‎07-19-2015

When I use intel mkl , intel mpi ,intel compiler to test linpack l in e5 v3 2660, I get RMAX/RPEAK IS 11% .BUT in e5 v2 is 93%. Why?

gridpc_g_ · ‎07-19-2015

Hpl.dat is same in v3 and v2 cpu .please give me some advice.Thanks.

Steve_H_Intel1 · ‎07-20-2015

Transferred to Intel® Math Kernel Library Forum.

Ying_H_Intel · ‎07-22-2015

Hi gridpc g,

From http://ark.intel.com/compare/75272,81706. The two CPU feature are look like below

Intel® Xeon® Processor E5-2660 v2 (25M Cache, 2.20 GHz)	Intel® Xeon® Processor E5-2660 v3 (25M Cache, 2.60 GHz)
Code Name	Ivy Bridge EP	Haswell
- Essentials
Status	Launched	Launched
Launch Date	Q3'13	Q3'14
Processor Number	E5-2660V2	E5-2660V3

Instruction Set Extensions

AVX

AVX 2.0

Could you please provide the details about HPL.dat and which binary are you running?

Best Regards,

Ying

VipinKumar_E_Intel · ‎07-22-2015

Hi,

Are you using the Intel optimized linkpack binary from https://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download?

Or, are you building the HPL yourself? If yes, what are the compiler/link options? Also, provide platform details for both.

--Vipin

Murat_G_Intel · ‎07-27-2015

Yes, providing the details above will help. The recommended HPL binary is the offload version: mp_linpack/bin_intel/intel64/xhpl_offload_intel64 . The NB value can be set to 192 for v3 systems. You can first run this binary with 1 MPI rank per node. Later you may try to run it with 1 MPI per socket to get the best performance using the scripts provided: mp_linpack/bin_intel/intel64/runme_offload_intel64.

Thank you.

Reza_M_1 · ‎07-29-2015

Hello Murat,

I have same problem more or less, I have two set of computing nodes (16 of 2690 V2 procesor type nodes & 64 of 2690 V3 processor type nodes) which tried to get Linpack benchmarks . First I tried Intel Parallel Studio with 16* V2 computing nodes and I got about 92% performance efficiency which is great.

in second step I tried to run same Linpack over 16 * V3 type processors but I just got 74% efficency . the configuration is same, even I reinstalled Paralle studio on V3 processor type computing nodes but still have same problem.

By the way, running Linpack over one single node gives 87% performance but over 16 or 64 will drop to 74%.

Please kindly support me to solve the problem.

Best Regards,

Reza Mirani

mirani@hpcmicrosystems.net

VipinKumar_E_Intel · ‎07-30-2015

Hi gridpc and Reza,

As Efe recommended, could you the offload binary that I mentioned above, even if there is no Xeon Phi in your systems. We will be avoiding this confusion soon, having just one single binary out there.

Vipin

Reza_M_1 · ‎07-30-2015

Hi Vipin,

I tried make arch=intel64 version=offload but the was an error which means couldn't find offload library. I used offload pre-build binary in bin_intel64 folder but the results was very poor and I couldn't understand the results structure. is it per each node or combination of ?

on the other hand, cpu utilization was not equal in this test, some core had 200% and some of them was free.

Reza

Kazushige_G_Intel · ‎07-31-2015

HPL performance is limited by the slowest node. Please check if all nodes perform 87% equally. Also performance tends to be low if problem size is too small due to communication overhead. Please try to increase problem size.

Thanks,

Kazushige Goto