- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My KNL platform is based on Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz, 1 node, 68 cores, 96 GB memory.
Firstly, I checked the performance of Intel Distribution for LINPACK Benchmark on 1 node at this locate ./benchmarks/mp_linpack/ and I got the good performance about 1700 Gflops for case: N=40000, NB = 336, P = 1, Q=1 and "mpirun -np 1 ./xhpl ".
Secondly in HPL 2.3, if the same input value above but the performance really bad, it's only 723 Gflops. If I executed with N = 100000, it got about 942 Gflops. But it until lower than comparing with LINPACK benchmark.
And another thing, when I check micprun , it has error ( attach files).
Is this the problem in Make.Intel64 file?
what should I do to get the higher result in HPL 2.3?
Thanks a lot.
SHELL = /bin/sh
#
CD = cd
CP = cp
LN_S = ln -fs
MKDIR = mkdir -p
RM = /bin/rm -f
TOUCH = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
#ARCH = Linux_Intel64
ARCH = $(arch)
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
#TOPdir = $(HOME)/hpl
TOPdir = /home/tuyen1/HPL/hpl-2.3/install_hpl
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
#
HPLlib = $(LIBdir)/libhpl.a
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the C compiler where to find the Message Passing library
# header files, MPlib is defined to be the name of the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
# MPdir = /opt/intel/mpi/4.1.0
# MPinc = -I$(MPdir)/include64
# MPlib = $(MPdir)/lib64/libmpi.a
MPdir =/opt/intel/compilers_and_libraries_2018.5.274/linux/mpi
MPinc = -I$(MPdir)/include64
MPlib = $(MPdir)/lib64/libmpi.a
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the C compiler where to find the Linear Algebra library
# header files, LAlib is defined to be the name of the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir = /opt/intel/compilers_and_libraries_2018.5.274/linux/mkl
ifndef LAinc
LAinc = $(LAdir)/include
endif
ifndef LAlib
LAlib = -L$(LAdir)/lib/intel64 \
-Wl,--start-group \
$(LAdir)/lib/intel64/libmkl_intel_lp64.a \
$(LAdir)/lib/intel64/libmkl_intel_thread.a \
$(LAdir)/lib/intel64/libmkl_core.a \
-Wl,--end-group -lpthread -ldl
endif
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section if and only if you are not planning to use
# a BLAS library featuring a Fortran 77 interface. Otherwise, it is
# necessary to fill out the F2CDEFS variable with the appropriate
# options. **One and only one** option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_ : all lower case and a suffixed underscore (Suns,
# Intel, ...), [default]
# -DNoChange : all lower case (IBM RS6000),
# -DUpCase : all upper case (Cray),
# -DAdd__ : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]
# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle : The string address is passed at the string loca-
# tion on the stack, and the string length is then
# passed as an F77_INTEGER after all explicit
# stack arguments, [default]
# -DStringStructPtr : The address of a structure is passed by a
# Fortran 77 string, and the structure is of the
# form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal : A structure is passed by value for each Fortran
# 77 string, and the structure is of the form:
# struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle : Special option for Cray machines, which uses
# Cray fcd (fortran character descriptor) for
# interoperation.
#
F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) -I$(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS call the cblas interface;
# -DHPL_CALL_VSIPL call the vsip library;
# -DHPL_DETAILED_TIMING enable detailed timers;
#
# By default HPL will:
# *) not copy L before broadcast,
# *) call the BLAS Fortran 77 interface,
# *) not display detailed timing information.
#
#HPL_OPTS = -DHPL_DETAILED_TIMING -DHPL_PROGRESS_REPORT
HPL_OPTS = -DASYOUGO -DHYBRID
#
# ----------------------------------------------------------------------
#
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC = mpiicc
CCNOOPT = $(HPL_DEFS) -O0 -w -nocompchk
OMP_DEFS = -qopenmp
#CCFLAGS = $(HPL_DEFS) -O3 -w -ansi-alias -i-static -z noexecstack -z relro -z now -nocompchk -Wall
CCFLAGS = $(HPL_DEFS) -O3 -w -ansi-alias -i-static -z noexecstack -z relro -z now -nocompchk
#
#
# On some platforms, it is necessary to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER = $(CC)
LINKFLAGS = $(CCFLAGS) $(OMP_DEFS) -mt_mpi -qopenmp -nocompchk
#
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nguyen ,
I will investigate the issue and will get back to you.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Jon,
Thanks for your reply.
Although I try another way by using MCDRAM in HPL but the performance not be greater than so much. For N = 100000, I got 1056Gflops.
I hope to hear from you soon.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page