Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
6743 Discussions

How to get the good performance from buiding the Netlib HPL from Source Code

Tuyen__Nguyen
Beginner
485 Views

My KNL platform is based on Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz, 1 node, 68 cores, 96 GB memory.

Firstly, I checked the performance of Intel Distribution for LINPACK Benchmark on 1 node at this locate ./benchmarks/mp_linpack/  and I got the good performance about 1700 Gflops for case: N=40000, NB = 336, P = 1, Q=1 and "mpirun -np 1 ./xhpl ". 

Secondly in HPL 2.3, if the same input value above but the performance really bad, it's only 723 Gflops. If I executed with N = 100000, it got about 942 Gflops. But it until lower than comparing with LINPACK benchmark.

And another thing, when I check micprun , it has error ( attach files). 

Is this the problem in Make.Intel64 file?

what should I do to get the higher result in HPL 2.3? 

Thanks a lot.

SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -fs
MKDIR        = mkdir -p
RM           = /bin/rm -f
TOUCH        = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
#ARCH         = Linux_Intel64
ARCH          = $(arch)
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
#TOPdir       = $(HOME)/hpl
TOPdir       = /home/tuyen1/HPL/hpl-2.3/install_hpl
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
#
HPLlib       = $(LIBdir)/libhpl.a
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
# MPdir        = /opt/intel/mpi/4.1.0
# MPinc        = -I$(MPdir)/include64
# MPlib        = $(MPdir)/lib64/libmpi.a
MPdir          =/opt/intel/compilers_and_libraries_2018.5.274/linux/mpi
MPinc        = -I$(MPdir)/include64
MPlib        = $(MPdir)/lib64/libmpi.a
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir        = /opt/intel/compilers_and_libraries_2018.5.274/linux/mkl
ifndef  LAinc
LAinc        = $(LAdir)/include
endif
ifndef  LAlib
LAlib        = -L$(LAdir)/lib/intel64 \
               -Wl,--start-group \
                $(LAdir)/lib/intel64/libmkl_intel_lp64.a \
                $(LAdir)/lib/intel64/libmkl_intel_thread.a \
                $(LAdir)/lib/intel64/libmkl_core.a \
                -Wl,--end-group -lpthread -ldl
 endif
 #
 # ----------------------------------------------------------------------
 # - F77 / C interface --------------------------------------------------
 # ----------------------------------------------------------------------
 # You can skip this section  if and only if  you are not planning to use
 # a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
 # necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
 # options.  **One and only one**  option should be chosen in **each** of
 # the 3 following categories:
 #
 # 1) name space (How C calls a Fortran 77 routine)
 #
 # -DAdd_              : all lower case and a suffixed underscore  (Suns,
 #                       Intel, ...),                           [default]
 # -DNoChange          : all lower case (IBM RS6000),
 # -DUpCase            : all upper case (Cray),
 # -DAdd__             : the FORTRAN compiler in use is f2c.
 #
 # 2) C and Fortran 77 integer mapping
 #
 # -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int,         [default]
 # -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
 # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
 #
 # 3) Fortran 77 string handling
 #
 # -DStringSunStyle    : The string address is passed at the string loca-
 #                       tion on the stack, and the string length is then
 #                       passed as  an  F77_INTEGER  after  all  explicit
 #                       stack arguments,                       [default]
 # -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
 #                       Fortran 77  string,  and the structure is of the
 #                       form: struct {char *cp; F77_INTEGER len;},
 # -DStringStructVal   : A structure is passed by value for each  Fortran
 #                       77 string,  and  the  structure is  of the form:
 #                       struct {char *cp; F77_INTEGER len;},
 # -DStringCrayStyle   : Special option for  Cray  machines,  which  uses
 #                       Cray  fcd  (fortran  character  descriptor)  for
 #                       interoperation.
 #
 F2CDEFS      = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) -I$(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L           force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS       call the cblas interface;
# -DHPL_CALL_VSIPL       call the vsip  library;
# -DHPL_DETAILED_TIMING  enable detailed timers;
#
# By default HPL will:
#    *) not copy L before broadcast,
#    *) call the BLAS Fortran 77 interface,
#    *) not display detailed timing information.
#
#HPL_OPTS     = -DHPL_DETAILED_TIMING -DHPL_PROGRESS_REPORT
HPL_OPTS     = -DASYOUGO -DHYBRID
#
# ----------------------------------------------------------------------
#
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC       = mpiicc
CCNOOPT  = $(HPL_DEFS) -O0 -w -nocompchk
OMP_DEFS = -qopenmp
#CCFLAGS  = $(HPL_DEFS) -O3 -w -ansi-alias -i-static -z noexecstack -z relro -z now -nocompchk -Wall
CCFLAGS  = $(HPL_DEFS) -O3 -w -ansi-alias -i-static -z noexecstack -z relro -z now -nocompchk
#
#
# On some platforms,  it is necessary  to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER       = $(CC)
LINKFLAGS    = $(CCFLAGS) $(OMP_DEFS) -mt_mpi -qopenmp -nocompchk
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo

 

0 Kudos
2 Replies
Jonghak_K_Intel
Employee
485 Views

Nguyen , 

 

I will investigate the issue and will get back to you. 

Thank you 

Tuyen__Nguyen
Beginner
485 Views

Dear Jon,

Thanks for your reply.

Although I try another way by using MCDRAM in HPL but the performance not be greater than so much. For N = 100000, I got 1056Gflops.

I hope to hear from you soon. 

Reply