MKL HPCG Benchmark -- Missing MKL support and Incorrect Usage

Ian_K_1 · ‎12-18-2015

I downloaded the MKL Benchmark set (for Linux) today, from:

https://software.intel.com/en-us/articles/intel-mkl-benchmarks-suite

But the HPCG benchmark I wanted to run doesn't appear to actually support MKL -- the printed usage options list:

--mkl=yes/no: use MKL (default no)

but this doesn't appear to be implemented in the code, and trying to use it with the binaries gives:

$ ./xhpcg_avx --mkl=yes
./xhpcg_avx: unrecognized option '--mkl=yes'
$ ./xhpcg_avx --mkl yes
./xhpcg_avx: unrecognized option '--mkl'

Has anyone heard anything about whether this is going to be implemented? It seems odd to have it listed in a released download but still returning an error.

Gennady_F_Intel · ‎12-21-2015

regardless this option has been used or not, the xhpcg_avx will produce the correct result. You may find out this output into the same directory. Please have a look. we will update the list of supported command line options of this benchmark.

Paul_I_ · ‎03-25-2016

I am getting wildly varying results when running hpcg on one of our servers.

/opt/intel/16.0/compilers_and_libraries_2016.1.150/linux/mkl/benchmarks/hpcg/bin/xhpcg_avx -mkl=yes  --yaml="`hostname`.`date '+%Y%m%d%H%M%S'`"

GFLOP/s Summary:
HPCG result is VALID with a GFLOP/s rating of: 15.0287
GFLOP/s Summary:
HPCG result is VALID with a GFLOP/s rating of: 13.123
GFLOP/s Summary:
HPCG result is VALID with a GFLOP/s rating of: 13.9009
GFLOP/s Summary:
HPCG result is VALID with a GFLOP/s rating of: 12.5201
GFLOP/s Summary:
HPCG result is VALID with a GFLOP/s rating of: 11.5567
GFLOP/s Summary:
HPCG result is VALID with a GFLOP/s rating of: 7.14771
pimmaraj@hpchost1:~/hpcg_mkl/hpcg_intel>

DDOT Timing Variations:
  Min DDOT MPI_Allreduce time: -7.44059e+10
  Max DDOT MPI_Allreduce time: -7.44059e+10
  Avg DDOT MPI_Allreduce time: -7.44059e+10
__________ Final Summary __________:

Gennady_F_Intel · ‎03-28-2016

Paul, are you aware the HPCG works alone at this moment at this time?

McCalpinJohn · ‎03-29-2016

Good implementations of HPCG give performance results that are very strongly correlated with the results of the STREAM benchmark. Memory-bandwidth-limited codes are very sensitive to NUMA issues on multi-socket servers, so they should always be run with some sort of process binding, and (when possible) should also use memory binding.

I have not looked at this implementation of the HPCG benchmark, so I don't know which parallel libraries are involved, but investigation of the corresponding environment variables may be useful.