Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
24 Views

HPCC crashing - Begin of StarDGEMM section.

HPCC is crashing at the begining of the StarDGEMM section

I compiled hpcc using MKL 10.3 update 7 with GNU compilers and MPIch2

This is the error on /var/log/messages:

May 8 13:47:44 node1 kernel: hpcc[14482] general protection ip:4b6869 sp:7fff18307650 error:0 in hpcc[400000+41d000]

The hpcc executable that I compiled with Atlas works fine.

SELinux is set to permissive.

I am running this on a RHEL 6.2 node (x64) with:

gcc Version : 4.4.6 Release : 3.el6
mpich2 Version : 1.2.1 Release : 2.3.el6

I also tried on another server running RHEL 6.0 (x64) with:

gcc Version : 4.4.4 Release : 13.el6
mpich2 Version : 1.2.1 Release : 2.3.el6

but to no avail.

Any ideas?

TIA

Make file:
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL = /bin/sh
#
CD = cd
CP = cp
LN_S = ln -s
MKDIR = mkdir
RM = /bin/rm -f
TOUCH = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH = x64
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir = ../../..
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
#
HPLlib = $(LIBdir)/libhpl.a
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the C compiler where to find the Message Passing library
# header files, MPlib is defined to be the name of the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
#MPdir = /usr/bin/mpi
#MPinc = -I$(MPdir)/include
#MPlib = /usr/lib64/mpich2/lib/libmpich.a
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the C compiler where to find the Linear Algebra library
# header files, LAlib is defined to be the name of the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir = /opt/intel/mkl/lib/intel64
LAinc = /opt/intel/mkl/include
LAlib = -Wl,--start-group $(LAdir)/libmkl_cdft_core.a $(LAdir)/libmkl_intel_ilp64.a $(LAdir)/libmkl_sequential.a $(LAdir)/libmkl_core.a $(LAdir)/libmkl_blacs_intelmpi_ilp64.a -Wl,--end-group -lpthread -lm
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section if and only if you are not planning to use
# a BLAS library featuring a Fortran 77 interface. Otherwise, it is
# necessary to fill out the F2CDEFS variable with the appropriate
# options. **One and only one** option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_ : all lower case and a suffixed underscore (Suns,
# Intel, ...), [default]
# -DNoChange : all lower case (IBM RS6000),
# -DUpCase : all upper case (Cray),
# -DAdd__ : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]
# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle : The string address is passed at the string loca-
# tion on the stack, and the string length is then
# passed as an F77_INTEGER after all explicit
# stack arguments, [default]
# -DStringStructPtr : The address of a structure is passed by a
# Fortran 77 string, and the structure is of the
# form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal : A structure is passed by value for each Fortran
# 77 string, and the structure is of the form:
# struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle : Special option for Cray machines, which uses
# Cray fcd (fortran character descriptor) for
# interoperation.
#
F2CDEFS =
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) -I$(LAinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS call the cblas interface;
# -DHPL_CALL_VSIPL call the vsip library;
# -DHPL_DETAILED_TIMING enable detailed timers;
#
# By default HPL will:
# *) not copy L before broadcast,
# *) call the BLAS Fortran 77 interface,
# *) not display detailed timing information.
#
HPL_OPTS = -DHPL_CALL_CBLAS
#
# ----------------------------------------------------------------------
#
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC = /usr/bin/mpicc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -DMKL_ILP64 -m64
#
# On some platforms, it is necessary to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER = /usr/bin/mpicc
LINKFLAGS = $(CCFLAGS)
#
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
#
# ----------------------------------------------------------------------




0 Kudos
4 Replies
Highlighted
Beginner
24 Views

Recompiled with debug symbols and run with GDB

Program received signal SIGSEGV, Segmentation fault.
0x00000000004ac349 in mkl_blas_mc_dgemm_mscale ()

Any ideas?

TIA
0 Kudos
Highlighted
Beginner
24 Views

I re-compiled HPCC in an Ubuntu Machine [12.04, which I know is not supported] where we are doing some tests and I get the same issue, except that this time, it does seem to make start:

Begin of StarDGEMM section.
Scaled residual: 0.0327184

Output from Syslog:

May 9 11:14:28 ubuntu12lts kernel: [1025181.731735] hpcc[13788]: segfault at 1002189ac0 ip 00000000004b0599 sp 00007fffcaed4790 error 4
May 9 11:14:28 ubuntu12lts kernel: [1025181.759789] hpcc[13789]: segfault at 10028b5aa0 ip 00000000004b0599 sp 00007fff1614e590 error 4 in hpcc[400000+421000]
May 9 11:14:28 ubuntu12lts kernel: [1025181.760159] hpcc[13790]: segfault at 1001d2e770 ip 00000000004b0599 sp 00007fff7eb44130 error 4 in hpcc[400000+421000]
May 9 11:14:28 ubuntu12lts kernel: [1025181.770489] in hpcc[400000+421000]

Any ideas?

TIA
0 Kudos
Highlighted
New Contributor III
24 Views

Hi,

In order for DGEMM in HPCC to work correctly you need to link with MKL's lp64 (not ilp64) interface library.
Please note also, that to use more than 2^32 elements vector in MPI FFT with MKL you will need to follow the steps described in http://software.intel.com/en-us/articles/performance-tools-for-software-developers-use-of-intel-mkl-...
.

Best regards,
Vladimir
0 Kudos
Highlighted
Beginner
24 Views

thanks, that did the trick

I'd already looked at the link, but clearly did not follow the instructions correctly ;)

Again thanks
0 Kudos