Success! Subscription added.
Success! Subscription removed.
Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.
Hi all,
I am new to HPC, playing with MKL, CUDA and HPL from Nvidia to optimize the result.
Everything compiling smoothly until enter the ptest/ directory and broke, It says:
make[2]: Entering directory `/root/hpl-2.0_FERMI_v15/testing/ptest/CUDA' mpicc -DAdd__ -DF77_INTEGER=int -DStringSunStyle -DCUDA -I/root/hpl-2.0_FERMI_v15/include -I/root/hpl-2.0_FERMI_v15/include/CUDA -I/opt/intel/mkl/include -I/opt/intel/impi/4.1.3.049/intel64/include -I/usr/local/cuda/include -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp -mt_mpi -m64 -lmpi_mt -o /root/hpl-2.0_FERMI_v15/bin/CUDA/xhpl HPL_pddriver.o HPL_pdinfo.o HPL_pdtest.o /root/hpl-2.0_FERMI_v15/lib/CUDA/libhpl.a -L /root/hpl-2.0_FERMI_v15/src/cuda -ldgemm -L/opt/cuda6/lib64 -lcuda -lcudart -lcublas -L/opt/intel/mkl/lib/intel64 -lpthread -lmpi_mt /opt/intel/impi/4.1.3.049/intel64/lib/libmpi_mt.so -lmpi_mt /usr/lib64/gcc/x86_64-slackware-linux/4.8.2/../../../../x86_64-slackware-linux/bin/ld: MPIR_Thread: TLS definition in /opt/intel/impi/4.1.3.049/intel64/lib/libmpi_mt.so section .tbss mismatches non-TLS definition in /opt/intel/impi/4.1.3.049/intel64/lib/libmpi.so.4 section .bss /opt/intel/impi/4.1.3.049/intel64/lib/libmpi.so.4: could not read symbols: Bad value collect2: error: ld returned 1 exit status make[2]: *** [dexe.grd] Error 1
FYI:
mpicc -show
gcc -I/opt/intel/impi/4.1.3.049/intel64/include -L/opt/intel/impi/4.1.3.049/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /opt/intel/impi/4.1.3.049/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/4.1 -lmpigf -lmpi -lmpigi -ldl -lrt -lpthread
SHELL env
CC=/opt/intel/composer_xe_2013_sp1.3.174/bin/intel64/icc CXX=/opt/intel/composer_xe_2013_sp1.3.174/bin/intel64icpc F77=/opt/intel/impi/4.1.3.049/intel64/bin/mpiifort FC=/opt/intel/impi/4.1.3.049/intel64/bin/mpiifort FC90=/opt/intel/impi/4.1.3.049/intel64/bin/mpiifort LD_LIBRARY_PATH=/opt/intel/composerxe/mkl/lib/intel64:/opt/intel/impi/4.1.3.049/lib64:/opt/intel/composerxe/lib/intel64/:/opt/intel/composer_xe_2013_sp1.3.174/tbb/lib/intel64/:/opt/intel/composer_xe_2013_sp1.3.174/ipp/lib/intel64/:/opt/intel/composer_xe_2013_sp1.3.174/compiler/lib/intel64/:/opt/cuda6/lib64:/opt/cuda6/lib/:/usr/local/bin/mpi/intel/lib/:/usr/X11R6/lib64/:/usr/local/lib64/ PATH= /opt/intel/impi/4.1.3.049/bin64/:/opt/intel/composer_xe_2013_sp1.3.174/bin/intel64:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/lib64/kde4/libexec:/usr/lib64/java/bin:/usr/lib64/java/jre/bin:/usr/lib64/java/jre/bin:/usr/lib64/qt/bin:/usr/share/texmf/bin
gcc --version
gcc (GCC) 4.8.2
part of Make.CUDA
MPdir = /opt/intel/impi/4.1.3.049/intel64 MPlib = $(MPdir)/lib/libmpi_mt.so LAdir = /opt/intel/mkl/lib/intel64 LAinc = -I/opt/intel/mkl/include LAlib = -L $(TOPdir)/src/cuda -ldgemm -L/opt/cuda6/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -lpthread -lmpi_mt CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -fopenmp -mt_mpi -m64 LINKFLAGS = $(CCFLAGS) -lmpi_mt
My `effort' on fix this error:
As what I found in:
https://software.intel.com/en-us/forums/topic/392483
I put -lmpi_mt everywhere to make sure no lib is NOT compiled with libmpi.so
In thread:
https://software.intel.com/en-us/forums/topic/294642
I put an extra -mt_mpi flag to CCFLAGS
I use the -show trick discribed here:
https://software.intel.com/en-us/forums/topic/508632
It display the -lmpi_mt, but the compiling still not working as well.
Tried to replace Impi with openmpi which compiled by icc as discribe on a web source pdf: HOWTO-HPL-GPU.pdf
the same problem as the top of this thread discribed
FYI2:
I used to succeed in compiling & running the (pure) hpl with MKL, part of Make.Linun_PII_CBLAS here:
HPLlibHybrid = /opt/intel/composer_xe_2013_sp1.3.174/mkl/benchmarks/mp_linpack/lib_hybrid/mic/libhpl_hybrid.a LAdir = /opt/intel LAinc = -I$(LAdir)/mkl/include LAlib = -L$(LAdir)/mkl/lib/intel64 -Wl,--start-group $(LAdir)/mkl/lib/intel64/libmkl_intel_lp64.a $(LAdir)/mkl/lib/intel64/libmkl_intel_thread.a $(LAdir)/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -ldl $(HPLlibHybrid) F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) HPL_OPTS = -DASYOUGO -DHYBRID HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) CC = mpiicc CCNOOPT = $(HPL_DEFS) -O0 -w -nocompchk MKLINCDIR = -I"/opt/intel/mkl/include" CCFLAGS = $(HPL_DEFS) $(MKLINCDIR) -O3 -w -ansi-alias -i-static -z noexecstack -z relro -z now -openmp -nocompchk LINKER = $(CC) LINKFLAGS = $(CCFLAGS) -openmp -mt_mpi $(STATICFLAG) -nocompchk
Ok, I think I've found it now. Look at lines 313-315.
mpicc -O0 -c -fPIC -DMPI cuda_dgemm.c -o cuda_dgemm.o -I/usr/local/cuda/include mpicc -O0 -c -fPIC -DMPI fermi_dgemm.c -o fermi_dgemm.o -I/usr/local/cuda/include mpicc -O3 -shared -Wl,-soname,libdgemm.so.1 -o libdgemm.so.1.0.1 cuda_dgemm.o fermi_dgemm.o -L/usr/local/cuda/lib64 -lcudart -lcuda
Here, you're compiling libdgemm.so.1.0.1 with the single-threaded MPI library. The compile options used here do not appear to be defined in the main makefile. Check in /root/t/build/hpl-2.0_FERMI_v15/src/cuda for any makefiles and add -mt_mpi there.
Link Copied
EDIT: more `effort'
Removing all -openmp flags and resulting:
within libhpl.a can't reference to omp_get_thread_num()
...
Rewrite the Make.CUDA on the successfully compiled one (with MKL)
# # This is just a sample Make. # The user may need to edit: # 1.) TOPdir # 2.) MPI variables (MPdir,MPinc,MPlib) # 3.) MKL BLAS variables (LAdir, LAinc, LAlib) # 4.) The Compiler and Compiler/Linker Options (CC,CCFLAGS) # # # -- High Performance Computing Linpack Benchmark (HPL) # HPL - 1.0a - January 20, 2004 # Antoine P. Petitet # University of Tennessee, Knoxville # Innovative Computing Laboratories # (C) Copyright 2000-2004 All Rights Reserved # # -- Copyright notice and Licensing terms: # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions, and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # 3. All advertising materials mentioning features or use of this # software must display the following acknowledgement: # This product includes software developed at the University of # Tennessee, Knoxville, Innovative Computing Laboratories. # # 4. The name of the University, the name of the Laboratory, or the # names of its contributors may not be used to endorse or promote # products derived from this software without specific written # permission. # # -- Disclaimer: # # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS # ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT # LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY # OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT # LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, # DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # ###################################################################### # Copyright (c) 2011, NVIDIA CORPORATION # All rights reserved. # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are # met: Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # Redistributions in binary form must reproduce the above copyright notice, # this list of conditions and the following disclaimer in the documentation # and/or other materials provided with the distribution. Neither the name # of NVIDIA nor the names of its contributors may be used to endorse or # promote products derived from this software without specific prior written # permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND # CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT # NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR # A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT # HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, # SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED # TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR # PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF # LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, # EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # # ---------------------------------------------------------------------- # - shell -------------------------------------------------------------- # ---------------------------------------------------------------------- # SHELL = /bin/sh # CD = cd CP = cp LN_S = ln -fs MKDIR = mkdir -p RM = /bin/rm -f TOUCH = touch # # ---------------------------------------------------------------------- # - Platform identifier ------------------------------------------------ # ---------------------------------------------------------------------- # ARCH = CUDA # # ---------------------------------------------------------------------- # - HPL Directory Structure / HPL library ------------------------------ # ---------------------------------------------------------------------- # # Set TOPdir to the location of where this is being built ifndef TOPdir TOPdir = /root/hpl-2.0_FERMI_v15 endif INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a # # ---------------------------------------------------------------------- # - Message Passing library (MPI) -------------------------------------- # ---------------------------------------------------------------------- # MPinc tells the C compiler where to find the Message Passing library # header files, MPlib is defined to be the name of the library to be # used. The variable MPdir is only used for defining MPinc and MPlib. # MPdir = /opt/intel/impi/4.1.3.049/intel64 MPinc = -I$(MPdir)/include MPlib = $(MPdir)/lib/libmpi_mt.so #MPlib = $(MPdir)/lib64/libmpich.a # # ---------------------------------------------------------------------- # - Linear Algebra library (BLAS) ----------------------------- # ---------------------------------------------------------------------- # LAinc tells the C compiler where to find the Linear Algebra library # header files, LAlib is defined to be the name of the library to be # used. The variable LAdir is only used for defining LAinc and LAlib. # #LAdir = $(TOPdir)/../../lib/em64t LAdir = /opt/intel/mkl/lib/intel64 LAinc = -I/opt/intel/mkl/include # CUDA #LAlib = -L /home/cuda/Fortran_Cuda_Blas -ldgemm -L/usr/local/cuda/lib -lcublas -L$(LAdir) -lmkl -lguide -lpthread LAlib = -L $(TOPdir)/src/cuda -ldgemm -L/opt/cuda6/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -lpthread # # ---------------------------------------------------------------------- # - F77 / C interface -------------------------------------------------- # ---------------------------------------------------------------------- # You can skip this section if and only if you are not planning to use # a BLAS library featuring a Fortran 77 interface. Otherwise, it is # necessary to fill out the F2CDEFS variable with the appropriate # options. **One and only one** option should be chosen in **each** of # the 3 following categories: # # 1) name space (How C calls a Fortran 77 routine) # # -DAdd_ : all lower case and a suffixed underscore (Suns, # Intel, ...), [default] # -DNoChange : all lower case (IBM RS6000), # -DUpCase : all upper case (Cray), # -DAdd__ : the FORTRAN compiler in use is f2c. # # 2) C and Fortran 77 integer mapping # # -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] # -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. # # 3) Fortran 77 string handling # # -DStringSunStyle : The string address is passed at the string loca- # tion on the stack, and the string length is then # passed as an F77_INTEGER after all explicit # stack arguments, [default] # -DStringStructPtr : The address of a structure is passed by a # Fortran 77 string, and the structure is of the # form: struct {char *cp; F77_INTEGER len;}, # -DStringStructVal : A structure is passed by value for each Fortran # 77 string, and the structure is of the form: # struct {char *cp; F77_INTEGER len;}, # -DStringCrayStyle : Special option for Cray machines, which uses # Cray fcd (fortran character descriptor) for # interoperation. # F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle # # ---------------------------------------------------------------------- # - HPL includes / libraries / specifics ------------------------------- # ---------------------------------------------------------------------- # HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) -I/usr/local/cuda/include HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) # # - Compile time options ----------------------------------------------- # # -DHPL_COPY_L force the copy of the panel L before bcast; # -DHPL_CALL_CBLAS call the cblas interface; # -DHPL_DETAILED_TIMING enable detailed timers; # -DASYOUGO enable timing information as you go (nonintrusive) # -DASYOUGO2 slightly intrusive timing information # -DASYOUGO2_DISPLAY display detailed DGEMM information # -DENDEARLY end the problem early # -DFASTSWAP insert to use DLASWP instead of HPL code # # By default HPL will: # *) not copy L before broadcast, # *) call the BLAS Fortran 77 interface, # *) not display detailed timing information. # HPL_OPTS = -DCUDA # ---------------------------------------------------------------------- # HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES) # # ---------------------------------------------------------------------- # - Compilers / linkers - Optimization flags --------------------------- # ---------------------------------------------------------------------- # # next two lines for GNU Compilers: CC = mpiicc CCNOOPT = $(HPL_DEFS) -O0 -w -nocompchk MKLINCDIR = -I/opt/intel/mkl/include CCFLAGS = $(HPL_DEFS) $(MKLINCDIR) -fomit-frame-pointer -O3 -funroll-loops -W -Wall -openmp # next two lines for Intel Compilers: # CC = mpicc #CCFLAGS = $(HPL_DEFS) -O3 -axS -w -fomit-frame-pointer -funroll-loops -openmp # CCNOOPT = $(HPL_DEFS) -O0 -w # # On some platforms, it is necessary to use the Fortran linker to find # the Fortran internals used in the BLAS library. # LINKER = $(CC) #LINKFLAGS = $(CCFLAGS) -static_mpi LINKFLAGS = $(CCFLAGS) -openmp -mt_mpi -nocompchk # ARCHIVER = ar ARFLAGS = r RANLIB = echo # # ---------------------------------------------------------------------- MAKE = make TOPdir=$(TOPdir)
The SAME error as described at top
Please help me,
thank you
dye
As far as I can tell, something is getting compiled without the multithreaded MPI library. Please attach the entire output from a fresh make as a text file, and I'll take a look at it.
Hi James Tullos,
Thanks for your generous viewing the output, here they comes via attachments(made via the `script`).
FYI:
setenv.txt -- the environment I used for compiling.
modifiedmake.txt -- make with Make.CUDA which lpthread everywhere; with `cat Make.CUDA` first.
puremake.txt -- make with Make.CUDA which only necessary changes.
many thanks,
dye
Oops, it looks that `modifiedmake.txt' attachment is missing,
let me upload it again!
Please add -mt_mpi to CCNOOPT. This is being used when compiling HPL_dlamch and is getting the single-threaded MPI library into your final build.
Hi James,
I patched your switch to my Make.CUDA, but still not working :S
the same error report as mentioned at OP
FYI: I made a script this time, too.
Thanks for your helping,
dye
Ok, I think I've found it now. Look at lines 313-315.
mpicc -O0 -c -fPIC -DMPI cuda_dgemm.c -o cuda_dgemm.o -I/usr/local/cuda/include mpicc -O0 -c -fPIC -DMPI fermi_dgemm.c -o fermi_dgemm.o -I/usr/local/cuda/include mpicc -O3 -shared -Wl,-soname,libdgemm.so.1 -o libdgemm.so.1.0.1 cuda_dgemm.o fermi_dgemm.o -L/usr/local/cuda/lib64 -lcudart -lcuda
Here, you're compiling libdgemm.so.1.0.1 with the single-threaded MPI library. The compile options used here do not appear to be defined in the main makefile. Check in /root/t/build/hpl-2.0_FERMI_v15/src/cuda for any makefiles and add -mt_mpi there.
Hi James,
Thank you very much for your supply! The `Bad Value' problem have been solved!!
An addition to the src/cuda/Makefile of:
CCNOOPT = -mt_mpi
Make the change!!
Although after that the libhpl.a stuff missing references in vary functions , I thought that's all Nvidia's issue and will going post that on their dev forum.
Yours,
dye
dye J. wrote:
Hi James,
Thank you very much for your supply! The `Bad Value' problem have been solved!!
An addition to the src/cuda/Makefile of:
CCNOOPT = -mt_mpiMake the change!!
Although after that the libhpl.a stuff missing references in vary functions , I thought that's all Nvidia's issue and will going post that on their dev forum.
Yours,
dye
Hello
I meet the same problem whit you in this article,and you have solved it.
I have read it , and i found that you solve the problem by add a " CCNOOPT = -mt_mpi " in Makefile.
But i don't know where exactly to add the sentence .
Can you please send me your Make.CUDA and Makefile that can correctly work ?
Thank you !
If you need to link against this library, you must set the option so that it is added to the link command, which should be using mpiifort, mpiicc, or mpiicpc, as it is a library which will work only with Intel MPI.
The CCNOPT Makefile macro mentioned in the old thread is not a usual one (unless it is typical for CUDA), but we can't guess about your Makefile.
James Tullos (Intel) wrote:
Ok, I think I've found it now. Look at lines 313-315.
mpicc -O0 -c -fPIC -DMPI cuda_dgemm.c -o cuda_dgemm.o -I/usr/local/cuda/include mpicc -O0 -c -fPIC -DMPI fermi_dgemm.c -o fermi_dgemm.o -I/usr/local/cuda/include mpicc -O3 -shared -Wl,-soname,libdgemm.so.1 -o libdgemm.so.1.0.1 cuda_dgemm.o fermi_dgemm.o -L/usr/local/cuda/lib64 -lcudart -lcudaHere, you're compiling libdgemm.so.1.0.1 with the single-threaded MPI library. The compile options used here do not appear to be defined in the main makefile. Check in /root/t/build/hpl-2.0_FERMI_v15/src/cuda for any makefiles and add -mt_mpi there.
What if I want to strictly use the single threaded MPI for this CUDA HPL build? is there are way to force this while compiling?
Thanks,
rtm
Community support is provided Monday to Friday. Other contact methods are available here.
Intel does not verify all solutions, including but not limited to any file transfers that may appear in this community. Accordingly, Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
For more complete information about compiler optimizations, see our Optimization Notice.