Hi Vipin,

gridpc_g_ · ‎07-19-2015

When I use intel mkl , intel mpi ,intel compiler to test linpack l in e5 v3 2660, I get RMAX/RPEAK IS 11% .BUT in e5 v2 is 93%. Why?

gridpc_g_ · ‎07-19-2015

Hpl.dat is same in v3 and v2 cpu .please give me some advice.Thanks.

Steve_H_Intel1 · ‎07-20-2015

Transferred to Intel® Math Kernel Library Forum.

Ying_H_Intel · ‎07-22-2015

Hi gridpc g,

From http://ark.intel.com/compare/75272,81706. The two CPU feature are look like below

Intel® Xeon® Processor E5-2660 v2 (25M Cache, 2.20 GHz)	Intel® Xeon® Processor E5-2660 v3 (25M Cache, 2.60 GHz)
Code Name	Ivy Bridge EP	Haswell
- Essentials
Status	Launched	Launched
Launch Date	Q3'13	Q3'14
Processor Number	E5-2660V2	E5-2660V3

Instruction Set Extensions

AVX

AVX 2.0

Could you please provide the details about HPL.dat and which binary are you running?

Best Regards,

Ying

VipinKumar_E_Intel · ‎07-22-2015

Hi,

Are you using the Intel optimized linkpack binary from https://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download?

Or, are you building the HPL yourself? If yes, what are the compiler/link options? Also, provide platform details for both.

--Vipin

Murat_G_Intel · ‎07-27-2015

Yes, providing the details above will help. The recommended HPL binary is the offload version: mp_linpack/bin_intel/intel64/xhpl_offload_intel64 . The NB value can be set to 192 for v3 systems. You can first run this binary with 1 MPI rank per node. Later you may try to run it with 1 MPI per socket to get the best performance using the scripts provided: mp_linpack/bin_intel/intel64/runme_offload_intel64.

Thank you.

Reza_M_1 · ‎07-29-2015

Hello Murat,

I have same problem more or less, I have two set of computing nodes (16 of 2690 V2 procesor type nodes & 64 of 2690 V3 processor type nodes) which tried to get Linpack benchmarks . First I tried Intel Parallel Studio with 16* V2 computing nodes and I got about 92% performance efficiency which is great.

in second step I tried to run same Linpack over 16 * V3 type processors but I just got 74% efficency . the configuration is same, even I reinstalled Paralle studio on V3 processor type computing nodes but still have same problem.

By the way, running Linpack over one single node gives 87% performance but over 16 or 64 will drop to 74%.

Please kindly support me to solve the problem.

Best Regards,

Reza Mirani

mirani@hpcmicrosystems.net

VipinKumar_E_Intel · ‎07-30-2015

Hi gridpc and Reza,

As Efe recommended, could you the offload binary that I mentioned above, even if there is no Xeon Phi in your systems. We will be avoiding this confusion soon, having just one single binary out there.

Vipin

Reza_M_1 · ‎07-30-2015

Hi Vipin,

I tried make arch=intel64 version=offload but the was an error which means couldn't find offload library. I used offload pre-build binary in bin_intel64 folder but the results was very poor and I couldn't understand the results structure. is it per each node or combination of ?

on the other hand, cpu utilization was not equal in this test, some core had 200% and some of them was free.

Reza

Kazushige_G_Intel · ‎07-31-2015

HPL performance is limited by the slowest node. Please check if all nodes perform 87% equally. Also performance tends to be low if problem size is too small due to communication overhead. Please try to increase problem size.

Thanks,

Kazushige Goto

why defferent E5 v3 and E5 v2 in linpack