Solved: Quote:James Tullos (Intel)

dye_J_ · ‎05-23-2014

Hi all,

I am new to HPC, playing with MKL, CUDA and HPL from Nvidia to optimize the result.

Everything compiling smoothly until enter the ptest/ directory and broke, It says:

make[2]: Entering directory `/root/hpl-2.0_FERMI_v15/testing/ptest/CUDA'
mpicc -DAdd__ -DF77_INTEGER=int -DStringSunStyle -DCUDA -I/root/hpl-2.0_FERMI_v15/include -I/root/hpl-2.0_FERMI_v15/include/CUDA -I/opt/intel/mkl/include -I/opt/intel/impi/4.1.3.049/intel64/include -I/usr/local/cuda/include -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp -mt_mpi -m64 -lmpi_mt -o /root/hpl-2.0_FERMI_v15/bin/CUDA/xhpl HPL_pddriver.o         HPL_pdinfo.o           HPL_pdtest.o /root/hpl-2.0_FERMI_v15/lib/CUDA/libhpl.a  -L /root/hpl-2.0_FERMI_v15/src/cuda  -ldgemm -L/opt/cuda6/lib64 -lcuda -lcudart -lcublas -L/opt/intel/mkl/lib/intel64 -lpthread -lmpi_mt /opt/intel/impi/4.1.3.049/intel64/lib/libmpi_mt.so  -lmpi_mt
/usr/lib64/gcc/x86_64-slackware-linux/4.8.2/../../../../x86_64-slackware-linux/bin/ld: MPIR_Thread: TLS definition in /opt/intel/impi/4.1.3.049/intel64/lib/libmpi_mt.so section .tbss mismatches non-TLS definition in /opt/intel/impi/4.1.3.049/intel64/lib/libmpi.so.4 section .bss
/opt/intel/impi/4.1.3.049/intel64/lib/libmpi.so.4: could not read symbols: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [dexe.grd] Error 1

FYI:

mpicc -show

gcc -I/opt/intel/impi/4.1.3.049/intel64/include -L/opt/intel/impi/4.1.3.049/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /opt/intel/impi/4.1.3.049/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/4.1 -lmpigf -lmpi -lmpigi -ldl -lrt -lpthread

SHELL env

CC=/opt/intel/composer_xe_2013_sp1.3.174/bin/intel64/icc
CXX=/opt/intel/composer_xe_2013_sp1.3.174/bin/intel64icpc
F77=/opt/intel/impi/4.1.3.049/intel64/bin/mpiifort
FC=/opt/intel/impi/4.1.3.049/intel64/bin/mpiifort
FC90=/opt/intel/impi/4.1.3.049/intel64/bin/mpiifort

LD_LIBRARY_PATH=/opt/intel/composerxe/mkl/lib/intel64:/opt/intel/impi/4.1.3.049/lib64:/opt/intel/composerxe/lib/intel64/:/opt/intel/composer_xe_2013_sp1.3.174/tbb/lib/intel64/:/opt/intel/composer_xe_2013_sp1.3.174/ipp/lib/intel64/:/opt/intel/composer_xe_2013_sp1.3.174/compiler/lib/intel64/:/opt/cuda6/lib64:/opt/cuda6/lib/:/usr/local/bin/mpi/intel/lib/:/usr/X11R6/lib64/:/usr/local/lib64/

PATH= /opt/intel/impi/4.1.3.049/bin64/:/opt/intel/composer_xe_2013_sp1.3.174/bin/intel64:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/lib64/kde4/libexec:/usr/lib64/java/bin:/usr/lib64/java/jre/bin:/usr/lib64/java/jre/bin:/usr/lib64/qt/bin:/usr/share/texmf/bin

gcc --version

gcc (GCC) 4.8.2

part of Make.CUDA

MPdir        = /opt/intel/impi/4.1.3.049/intel64
MPlib        = $(MPdir)/lib/libmpi_mt.so 
LAdir        = /opt/intel/mkl/lib/intel64
LAinc        = -I/opt/intel/mkl/include
LAlib        = -L $(TOPdir)/src/cuda  -ldgemm -L/opt/cuda6/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -lpthread -lmpi_mt
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp -mt_mpi -m64
LINKFLAGS    = $(CCFLAGS) -lmpi_mt

My `effort' on fix this error:

As what I found in:

https://software.intel.com/en-us/forums/topic/392483

I put -lmpi_mt everywhere to make sure no lib is NOT compiled with libmpi.so

In thread:

https://software.intel.com/en-us/forums/topic/294642

I put an extra -mt_mpi flag to CCFLAGS

I use the -show trick discribed here:

https://software.intel.com/en-us/forums/topic/508632

It display the -lmpi_mt, but the compiling still not working as well.

Tried to replace Impi with openmpi which compiled by icc as discribe on a web source pdf: HOWTO-HPL-GPU.pdf

the same problem as the top of this thread discribed

FYI2:

I used to succeed in compiling & running the (pure) hpl with MKL, part of Make.Linun_PII_CBLAS here:

HPLlibHybrid = /opt/intel/composer_xe_2013_sp1.3.174/mkl/benchmarks/mp_linpack/lib_hybrid/mic/libhpl_hybrid.a
LAdir        = /opt/intel
LAinc        = -I$(LAdir)/mkl/include
LAlib        = -L$(LAdir)/mkl/lib/intel64 -Wl,--start-group $(LAdir)/mkl/lib/intel64/libmkl_intel_lp64.a $(LAdir)/mkl/lib/intel64/libmkl_intel_thread.a $(LAdir)/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -ldl $(HPLlibHybrid)
F2CDEFS      = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
HPL_OPTS     = -DASYOUGO -DHYBRID
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
CC           = mpiicc
CCNOOPT      = $(HPL_DEFS) -O0 -w -nocompchk
MKLINCDIR    = -I"/opt/intel/mkl/include"
CCFLAGS      = $(HPL_DEFS) $(MKLINCDIR) -O3  -w -ansi-alias -i-static -z noexecstack -z relro -z now -openmp -nocompchk
LINKER       = $(CC)
LINKFLAGS    = $(CCFLAGS) -openmp -mt_mpi $(STATICFLAG) -nocompchk

James_T_Intel · ‎06-02-2014

Ok, I think I've found it now. Look at lines 313-315.

mpicc -O0 -c -fPIC -DMPI cuda_dgemm.c -o cuda_dgemm.o -I/usr/local/cuda/include
mpicc -O0 -c -fPIC -DMPI fermi_dgemm.c -o fermi_dgemm.o -I/usr/local/cuda/include
mpicc -O3 -shared -Wl,-soname,libdgemm.so.1 -o libdgemm.so.1.0.1 cuda_dgemm.o fermi_dgemm.o -L/usr/local/cuda/lib64 -lcudart -lcuda

Here, you're compiling libdgemm.so.1.0.1 with the single-threaded MPI library. The compile options used here do not appear to be defined in the main makefile. Check in /root/t/build/hpl-2.0_FERMI_v15/src/cuda for any makefiles and add -mt_mpi there.

View solution in original post

dye_J_ · ‎05-23-2014

EDIT: more `effort'

Removing all -openmp flags and resulting:

within libhpl.a can't reference to omp_get_thread_num()

...

Rewrite the Make.CUDA on the successfully compiled one (with MKL)

#
#     This is just a sample Make.
#     The user may need to edit:
#         1.) TOPdir
#         2.) MPI variables (MPdir,MPinc,MPlib)
#         3.) MKL BLAS variables (LAdir, LAinc, LAlib)
#         4.) The Compiler and Compiler/Linker Options (CC,CCFLAGS)
#

#  
#  -- High Performance Computing Linpack Benchmark (HPL)                
#     HPL - 1.0a - January 20, 2004                          
#     Antoine P. Petitet                                                
#     University of Tennessee, Knoxville                                
#     Innovative Computing Laboratories                                 
#     (C) Copyright 2000-2004 All Rights Reserved                       
#                                                                       
#  -- Copyright notice and Licensing terms:                             
#                                                                       
#  Redistribution  and  use in  source and binary forms, with or without
#  modification, are  permitted provided  that the following  conditions
#  are met:                                                             
#                                                                       
#  1. Redistributions  of  source  code  must retain the above copyright
#  notice, this list of conditions and the following disclaimer.        
#                                                                       
#  2. Redistributions in binary form must reproduce  the above copyright
#  notice, this list of conditions,  and the following disclaimer in the
#  documentation and/or other materials provided with the distribution. 
#                                                                       
#  3. All  advertising  materials  mentioning  features  or  use of this
#  software must display the following acknowledgement:                 
#  This  product  includes  software  developed  at  the  University  of
#  Tennessee, Knoxville, Innovative Computing Laboratories.             
#                                                                       
#  4. The name of the  University,  the name of the  Laboratory,  or the
#  names  of  its  contributors  may  not  be used to endorse or promote
#  products  derived   from   this  software  without  specific  written
#  permission.                                                          
#                                                                       
#  -- Disclaimer:                                                       
#                                                                       
#  THIS  SOFTWARE  IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
#  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,  INCLUDING,  BUT NOT
#  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
#  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
#  OR  CONTRIBUTORS  BE  LIABLE FOR ANY  DIRECT,  INDIRECT,  INCIDENTAL,
#  SPECIAL,  EXEMPLARY,  OR  CONSEQUENTIAL DAMAGES  (INCLUDING,  BUT NOT
#  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
#  DATA OR PROFITS; OR BUSINESS INTERRUPTION)  HOWEVER CAUSED AND ON ANY
#  THEORY OF LIABILITY, WHETHER IN CONTRACT,  STRICT LIABILITY,  OR TORT
#  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
#  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
# ######################################################################
# Copyright (c) 2011,  NVIDIA CORPORATION
# All rights reserved.
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are 
# met: Redistributions of source code must retain the above copyright 
# notice, this list of conditions and the following disclaimer. 
# Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution. Neither the name 
# of NVIDIA nor the names of its contributors may be used to endorse or 
# promote products derived from this software without specific prior written
# permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND 
# CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
# NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 
# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED 
# TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

#  
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -fs
MKDIR        = mkdir -p
RM           = /bin/rm -f
TOUCH        = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH         = CUDA
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
# Set TOPdir to the location of where this is being built
ifndef  TOPdir
TOPdir = /root/hpl-2.0_FERMI_v15
endif
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
#
HPLlib       = $(LIBdir)/libhpl.a 
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir        = /opt/intel/impi/4.1.3.049/intel64
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/libmpi_mt.so 
#MPlib        = $(MPdir)/lib64/libmpich.a
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
#LAdir        = $(TOPdir)/../../lib/em64t
LAdir        = /opt/intel/mkl/lib/intel64
LAinc        = -I/opt/intel/mkl/include
# CUDA
#LAlib        = -L /home/cuda/Fortran_Cuda_Blas  -ldgemm -L/usr/local/cuda/lib -lcublas  -L$(LAdir) -lmkl -lguide -lpthread
LAlib        = -L $(TOPdir)/src/cuda  -ldgemm -L/opt/cuda6/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -lpthread
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section  if and only if  you are not planning to use
# a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
# necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
# options.  **One and only one**  option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_              : all lower case and a suffixed underscore  (Suns,
#                       Intel, ...),                           [default]
# -DNoChange          : all lower case (IBM RS6000),
# -DUpCase            : all upper case (Cray),
# -DAdd__             : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int,         [default]
# -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle    : The string address is passed at the string loca-
#                       tion on the stack, and the string length is then
#                       passed as  an  F77_INTEGER  after  all  explicit
#                       stack arguments,                       [default]
# -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
#                       Fortran 77  string,  and the structure is of the
#                       form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal   : A structure is passed by value for each  Fortran
#                       77 string,  and  the  structure is  of the form:
#                       struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle   : Special option for  Cray  machines,  which  uses
#                       Cray  fcd  (fortran  character  descriptor)  for
#                       interoperation.
#
F2CDEFS      = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -I/usr/local/cuda/include
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L           force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS       call the cblas interface;
# -DHPL_DETAILED_TIMING  enable detailed timers;
# -DASYOUGO              enable timing information as you go (nonintrusive)
# -DASYOUGO2             slightly intrusive timing information
# -DASYOUGO2_DISPLAY     display detailed DGEMM information
# -DENDEARLY             end the problem early  
# -DFASTSWAP             insert to use DLASWP instead of HPL code
#
# By default HPL will:
#    *) not copy L before broadcast,
#    *) call the BLAS Fortran 77 interface,
#    *) not display detailed timing information.
#
HPL_OPTS     =  -DCUDA
# ----------------------------------------------------------------------
#
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
# next two lines for GNU Compilers:
CC      = mpiicc
CCNOOPT = $(HPL_DEFS) -O0 -w -nocompchk
MKLINCDIR = -I/opt/intel/mkl/include
CCFLAGS = $(HPL_DEFS) $(MKLINCDIR) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -openmp
# next two lines for Intel Compilers:
# CC      = mpicc
#CCFLAGS = $(HPL_DEFS) -O3 -axS -w -fomit-frame-pointer -funroll-loops -openmp 
#
CCNOOPT      = $(HPL_DEFS) -O0 -w
#
# On some platforms,  it is necessary  to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER       = $(CC)
#LINKFLAGS    = $(CCFLAGS) -static_mpi
LINKFLAGS    = $(CCFLAGS) -openmp -mt_mpi -nocompchk
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
#
# ----------------------------------------------------------------------
MAKE = make TOPdir=$(TOPdir)

The SAME error as described at top

Please help me,

thank you

dye

dye_J_ · ‎05-25-2014

Please help,

thanks,

dye

James_T_Intel · ‎05-27-2014

As far as I can tell, something is getting compiled without the multithreaded MPI library. Please attach the entire output from a fresh make as a text file, and I'll take a look at it.

dye_J_ · ‎05-27-2014

Hi James Tullos,

Thanks for your generous viewing the output, here they comes via attachments(made via the `script`).

FYI:

setenv.txt -- the environment I used for compiling.

modifiedmake.txt -- make with Make.CUDA which lpthread everywhere; with `cat Make.CUDA` first.

puremake.txt -- make with Make.CUDA which only necessary changes.

many thanks,

dye

dye_J_ · ‎05-29-2014

Oops, it looks that `modifiedmake.txt' attachment is missing,

let me upload it again!

James_T_Intel · ‎05-29-2014

Please add -mt_mpi to CCNOOPT. This is being used when compiling HPL_dlamch and is getting the single-threaded MPI library into your final build.

dye_J_ · ‎05-29-2014

Hi James,

I patched your switch to my Make.CUDA, but still not working :S

the same error report as mentioned at OP

FYI: I made a script this time, too.

Thanks for your helping,

dye

James_T_Intel · ‎06-02-2014

Ok, I think I've found it now. Look at lines 313-315.

mpicc -O0 -c -fPIC -DMPI cuda_dgemm.c -o cuda_dgemm.o -I/usr/local/cuda/include
mpicc -O0 -c -fPIC -DMPI fermi_dgemm.c -o fermi_dgemm.o -I/usr/local/cuda/include
mpicc -O3 -shared -Wl,-soname,libdgemm.so.1 -o libdgemm.so.1.0.1 cuda_dgemm.o fermi_dgemm.o -L/usr/local/cuda/lib64 -lcudart -lcuda

Here, you're compiling libdgemm.so.1.0.1 with the single-threaded MPI library. The compile options used here do not appear to be defined in the main makefile. Check in /root/t/build/hpl-2.0_FERMI_v15/src/cuda for any makefiles and add -mt_mpi there.

dye_J_ · ‎06-05-2014

Hi James,

Thank you very much for your supply! The `Bad Value' problem have been solved!!

An addition to the src/cuda/Makefile of:

CCNOOPT = -mt_mpi

Make the change!!

Although after that the libhpl.a stuff missing references in vary functions , I thought that's all Nvidia's issue and will going post that on their dev forum.

Yours,

dye

eric_z_ · ‎12-22-2014

dye J. wrote:

Hi James,

Thank you very much for your supply! The `Bad Value' problem have been solved!!

An addition to the src/cuda/Makefile of:
CCNOOPT = -mt_mpi
Make the change!!

Although after that the libhpl.a stuff missing references in vary functions , I thought that's all Nvidia's issue and will going post that on their dev forum.

Yours,

dye

Hello

I meet the same problem whit you in this article,and you have solved it.

I have read it , and i found that you solve the problem by add a " CCNOOPT = -mt_mpi " in Makefile.

But i don't know where exactly to add the sentence .

Can you please send me your Make.CUDA and Makefile that can correctly work ?

Thank you !

TimP · ‎12-22-2014

If you need to link against this library, you must set the option so that it is added to the link command, which should be using mpiifort, mpiicc, or mpiicpc, as it is a library which will work only with Intel MPI.

The CCNOPT Makefile macro mentioned in the old thread is not a usual one (unless it is typical for CUDA), but we can't guess about your Makefile.

Roshan_M_ · ‎01-20-2015

James Tullos (Intel) wrote:

Ok, I think I've found it now. Look at lines 313-315.
mpicc -O0 -c -fPIC -DMPI cuda_dgemm.c -o cuda_dgemm.o -I/usr/local/cuda/include
mpicc -O0 -c -fPIC -DMPI fermi_dgemm.c -o fermi_dgemm.o -I/usr/local/cuda/include
mpicc -O3 -shared -Wl,-soname,libdgemm.so.1 -o libdgemm.so.1.0.1 cuda_dgemm.o fermi_dgemm.o -L/usr/local/cuda/lib64 -lcudart -lcuda
Here, you're compiling libdgemm.so.1.0.1 with the single-threaded MPI library. The compile options used here do not appear to be defined in the main makefile. Check in /root/t/build/hpl-2.0_FERMI_v15/src/cuda for any makefiles and add -mt_mpi there.

What if I want to strictly use the single threaded MPI for this CUDA HPL build? is there are way to force this while compiling?

Thanks,

rtm

libmpi.so.4: could not read symbols: Bad Value