<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Problem with Hpcc, MPIFFT hangs at large scale in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809720#M3760</link>
    <description>Vladimir,&lt;BR /&gt;&lt;BR /&gt;Oh, I am using MKL in Intel Compiler Suite 11.1.072.&lt;BR /&gt;&lt;BR /&gt;I will try 10.3 soon.&lt;BR /&gt;&lt;BR /&gt;Thanks for your kind help.&lt;BR /&gt;&lt;BR /&gt;Best Regards</description>
    <pubDate>Tue, 12 Oct 2010 11:19:37 GMT</pubDate>
    <dc:creator>xuzheng97</dc:creator>
    <dc:date>2010-10-12T11:19:37Z</dc:date>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809710#M3750</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;I successfully compiled Hpcc with mkl and run it on 96 cores with N=120000.&lt;BR /&gt;&lt;BR /&gt;The following two page helps me a lot.&lt;BR /&gt;&lt;A href="http://origin-software.intel.com/en-us/articles/performance-tools-for-software-developers-use-of-intel-mkl-in-hpcc-benchmark/"&gt;http://origin-software.intel.com/en-us/articles/performance-tools-for-software-developers-use-of-intel-mkl-in-hpcc-benchmark/&lt;/A&gt;&lt;BR /&gt;&lt;A href="http://software.intel.com/en-us/forums/showthread.php?t=77727&amp;amp;o=d&amp;amp;s=lr"&gt;http://software.intel.com/en-us/forums/showthread.php?t=77727&amp;amp;o=d&amp;amp;s=lr&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;But when I tested on 192 cores with N=208000, the program hang at MPIFFT part.&lt;BR /&gt;The hpcc processes were still alive but no further output.&lt;BR /&gt;The configuration of hpccinf.txt is from hpcc website delivered result.&lt;BR /&gt;&lt;BR /&gt;The following is my output: &lt;BR /&gt;&lt;P&gt;########################################################################&lt;BR /&gt;This is the DARPA/DOE HPC Challenge Benchmark version 1.4.1 October 2003&lt;BR /&gt;Produced by Jack Dongarra and Piotr Luszczek&lt;BR /&gt;Innovative Computing Laboratory&lt;BR /&gt;University of Tennessee Knoxville and Oak Ridge National Laboratory&lt;BR /&gt;See the source files for authors of specific codes.&lt;BR /&gt;Compiled on Oct 9 2010 at 04:36:49&lt;BR /&gt;Current time (1286615973) is Sat Oct 9 05:19:33 2010&lt;BR /&gt;Hostname: 'node01'&lt;BR /&gt;########################################################################&lt;BR /&gt;================================================================================&lt;BR /&gt;HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008&lt;BR /&gt;Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK&lt;BR /&gt;Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK&lt;BR /&gt;Modified by Julien Langou, University of Colorado Denver&lt;BR /&gt;================================================================================&lt;BR /&gt;An explanation of the input/output parameters follows:&lt;BR /&gt;T/V : Wall time / encoded variant.&lt;BR /&gt;N : The order of the coefficient matrix A.&lt;BR /&gt;NB : The partitioning blocking factor.&lt;BR /&gt;P : The number of process rows.&lt;BR /&gt;Q : The number of process columns.&lt;BR /&gt;Time : Time in seconds to solve the linear system.&lt;BR /&gt;Gflops : Rate of execution for solving the linear system.&lt;BR /&gt;The following parameter values will be used:&lt;BR /&gt;N : 208000&lt;BR /&gt;NB : 168&lt;BR /&gt;PMAP : Row-major process mapping&lt;BR /&gt;P : 6&lt;BR /&gt;Q : 32&lt;BR /&gt;PFACT : Right&lt;BR /&gt;NBMIN : 4&lt;BR /&gt;NDIV : 2&lt;BR /&gt;RFACT : Crout&lt;BR /&gt;BCAST : 1ringM&lt;BR /&gt;DEPTH : 0&lt;BR /&gt;SWAP : Mix (threshold = 64)&lt;BR /&gt;L1 : transposed form&lt;BR /&gt;U : transposed form&lt;BR /&gt;EQUIL : yes&lt;BR /&gt;ALIGN : 8 double precision words&lt;BR /&gt;--------------------------------------------------------------------------------&lt;BR /&gt;- The matrix A is randomly generated for each test.&lt;BR /&gt;- The following scaled residual check will be computed:&lt;BR /&gt;||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )&lt;BR /&gt;- The relative machine precision (eps) is taken to be 2.220446e-16&lt;BR /&gt;- Computational tests pass if scaled residuals are less than 16.0&lt;BR /&gt;Begin of MPIRandomAccess section.&lt;BR /&gt;Running on 192 processors&lt;BR /&gt;Total Main table size = 2^35 = 34359738368 words&lt;BR /&gt;PE Main table size = (2^35)/192 = 178956971 words/PE MAX&lt;BR /&gt;Default number of updates (RECOMMENDED) = 137438953472&lt;BR /&gt;CPU time used = 147.169627 seconds&lt;BR /&gt;Real time used = 148.418957 seconds&lt;BR /&gt;0.926020208 Billion(10^9) Updates per second [GUP/s]&lt;BR /&gt;0.004823022 Billion(10^9) Updates/PE per second [GUP/s]&lt;BR /&gt;Verification: CPU time used = 107.623639 seconds&lt;BR /&gt;Verification: Real time used = 108.852249 seconds&lt;BR /&gt;Found 63008 errors in 34359738368 locations (passed).&lt;BR /&gt;Current time (1286616234) is Sat Oct 9 05:23:54 2010&lt;BR /&gt;End of MPIRandomAccess section.&lt;BR /&gt;Begin of StarRandomAccess section.&lt;BR /&gt;Main table size = 2^27 = 134217728 words&lt;BR /&gt;Number of updates = 536870912&lt;BR /&gt;CPU time used = 32.956989 seconds&lt;BR /&gt;Real time used = 32.959574 seconds&lt;BR /&gt;0.016288770 Billion(10^9) Updates per second [GUP/s]&lt;BR /&gt;Found 0 errors in 134217728 locations (passed).&lt;BR /&gt;Node(s) with error 0&lt;BR /&gt;Minimum GUP/s 0.016122&lt;BR /&gt;Average GUP/s 0.016524&lt;BR /&gt;Maximum GUP/s 0.016834&lt;BR /&gt;Current time (1286616300) is Sat Oct 9 05:25:00 2010&lt;BR /&gt;End of StarRandomAccess section.&lt;BR /&gt;Begin of SingleRandomAccess section.&lt;BR /&gt;Node(s) with error 0&lt;BR /&gt;Node selected 89&lt;BR /&gt;Single GUP/s 0.037591&lt;BR /&gt;Current time (1286616328) is Sat Oct 9 05:25:28 2010&lt;BR /&gt;End of SingleRandomAccess section.&lt;BR /&gt;Begin of MPIRandomAccess_LCG section.&lt;BR /&gt;Running on 192 processors&lt;BR /&gt;Total Main table size = 2^35 = 34359738368 words&lt;BR /&gt;PE Main table size = (2^35)/192 = 178956971 words/PE MAX&lt;BR /&gt;Default number of updates (RECOMMENDED) = 137438953472&lt;BR /&gt;CPU time used = 144.796987 seconds&lt;BR /&gt;Real time used = 145.966314 seconds&lt;BR /&gt;0.941579942 Billion(10^9) Updates per second [GUP/s]&lt;BR /&gt;0.004904062 Billion(10^9) Updates/PE per second [GUP/s]&lt;BR /&gt;Verification: CPU time used = 103.628247 seconds&lt;BR /&gt;Verification: Real time used = 104.397066 seconds&lt;BR /&gt;Found 65536 errors in 34359738368 locations (passed).&lt;BR /&gt;Current time (1286616579) is Sat Oct 9 05:29:39 2010&lt;BR /&gt;End of MPIRandomAccess_LCG section.&lt;BR /&gt;Begin of StarRandomAccess_LCG section.&lt;BR /&gt;Main table size = 2^27 = 134217728 words&lt;BR /&gt;Number of updates = 536870912&lt;BR /&gt;CPU time used = 33.080971 seconds&lt;BR /&gt;Real time used = 33.084620 seconds&lt;BR /&gt;0.016227205 Billion(10^9) Updates per second [GUP/s]&lt;BR /&gt;Found 0 errors in 134217728 locations (passed).&lt;BR /&gt;Node(s) with error 0&lt;BR /&gt;Minimum GUP/s 0.016004&lt;BR /&gt;Average GUP/s 0.016432&lt;BR /&gt;Maximum GUP/s 0.016728&lt;BR /&gt;Current time (1286616646) is Sat Oct 9 05:30:46 2010&lt;BR /&gt;End of StarRandomAccess_LCG section.&lt;BR /&gt;Begin of SingleRandomAccess_LCG section.&lt;BR /&gt;Node(s) with error 0&lt;BR /&gt;Node selected 89&lt;BR /&gt;Single GUP/s 0.037095&lt;BR /&gt;Current time (1286616673) is Sat Oct 9 05:31:13 2010&lt;BR /&gt;End of SingleRandomAccess_LCG section.&lt;BR /&gt;Begin of PTRANS section.&lt;BR /&gt;M: 104000&lt;BR /&gt;N: 104000&lt;BR /&gt;MB: 168&lt;BR /&gt;NB: 168&lt;BR /&gt;P: 6&lt;BR /&gt;Q: 32&lt;BR /&gt;TIME M N MB NB P Q TIME CHECK GB/s RESID&lt;BR /&gt;---- ----- ----- --- --- --- --- -------- ------ -------- -----&lt;BR /&gt;WALL 104000 104000 168 168 6 32 2.83 PASSED 30.526 0.00&lt;BR /&gt;CPU 104000 104000 168 168 6 32 2.83 PASSED 30.612 0.00&lt;BR /&gt;WALL 104000 104000 168 168 6 32 2.93 PASSED 29.512 0.00&lt;BR /&gt;CPU 104000 104000 168 168 6 32 2.92 PASSED 29.597 0.00&lt;BR /&gt;WALL 104000 104000 168 168 6 32 3.04 PASSED 28.430 0.00&lt;BR /&gt;CPU 104000 104000 168 168 6 32 3.03 PASSED 28.524 0.00&lt;BR /&gt;WALL 104000 104000 168 168 6 32 2.86 PASSED 28.430 0.00&lt;BR /&gt;CPU 104000 104000 168 168 6 32 2.85 PASSED 30.355 0.00&lt;BR /&gt;WALL 104000 104000 168 168 6 32 2.93 PASSED 28.430 0.00&lt;BR /&gt;CPU 104000 104000 168 168 6 32 2.92 PASSED 29.617 0.00&lt;BR /&gt;Finished 5 tests, with the following results:&lt;BR /&gt;5 tests completed and passed residual checks.&lt;BR /&gt;0 tests completed and failed residual checks.&lt;BR /&gt;0 tests skipped because of illegal input values.&lt;BR /&gt;END OF TESTS.&lt;BR /&gt;Current time (1286616720) is Sat Oct 9 05:32:00 2010&lt;BR /&gt;End of PTRANS section.&lt;BR /&gt;Begin of StarDGEMM section.&lt;BR /&gt;Scaled residual: 0.00407323&lt;BR /&gt;Node(s) with error 0&lt;BR /&gt;Minimum Gflop/s 10.504862&lt;BR /&gt;Average Gflop/s 11.179782&lt;BR /&gt;Maximum Gflop/s 11.460022&lt;BR /&gt;Current time (1286616851) is Sat Oct 9 05:34:11 2010&lt;BR /&gt;End of StarDGEMM section.&lt;BR /&gt;Begin of SingleDGEMM section.&lt;BR /&gt;Node(s) with error 0&lt;BR /&gt;Node selected 178&lt;BR /&gt;Single DGEMM Gflop/s 11.575138&lt;BR /&gt;Current time (1286616969) is Sat Oct 9 05:36:09 2010&lt;BR /&gt;End of SingleDGEMM section.&lt;BR /&gt;Begin of StarSTREAM section.&lt;BR /&gt;-------------------------------------------------------------&lt;BR /&gt;This system uses 8 bytes per DOUBLE PRECISION word.&lt;BR /&gt;-------------------------------------------------------------&lt;BR /&gt;Array size = 75111111, Offset = 0&lt;BR /&gt;Total memory required = 1.6789 GiB.&lt;BR /&gt;Each test is run 10 times, but only&lt;BR /&gt;the *best* time for each is used.&lt;BR /&gt;-------------------------------------------------------------&lt;BR /&gt;Your clock granularity/precision appears to be 1 microseconds.&lt;BR /&gt;Each test below will take on the order of 343454 microseconds.&lt;BR /&gt;(= 343454 clock ticks)&lt;BR /&gt;Increase the size of the arrays if this shows that&lt;BR /&gt;you are not getting at least 20 clock ticks per test.&lt;BR /&gt;-------------------------------------------------------------&lt;BR /&gt;WARNING -- The above is only a rough guideline.&lt;BR /&gt;For best results, please be sure you know the&lt;BR /&gt;precision of your system timer.&lt;BR /&gt;-------------------------------------------------------------&lt;BR /&gt;Function Rate (GB/s) Avg time Min time Max time&lt;BR /&gt;Copy: 3.3981 0.3554 0.3537 0.3571&lt;BR /&gt;Scale: 3.3042 0.3696 0.3637 0.3738&lt;BR /&gt;Add: 3.3802 0.5340 0.5333 0.5353&lt;BR /&gt;Triad: 3.6174 0.5037 0.4983 0.5103&lt;BR /&gt;-------------------------------------------------------------&lt;BR /&gt;Results Comparison:&lt;BR /&gt;Expected : 86625702996855472128.000000 17325140599371094016.000000 23100187465828126720.000000&lt;BR /&gt;Observed : 86625703071556763648.000000 17325140607756191744.000000 23100187473264709632.000000&lt;BR /&gt;Solution Validates&lt;BR /&gt;-------------------------------------------------------------&lt;BR /&gt;Node(s) with error 0&lt;BR /&gt;Minimum Copy GB/s 3.309907&lt;BR /&gt;Average Copy GB/s 3.372523&lt;BR /&gt;Maximum Copy GB/s 3.401539&lt;BR /&gt;Minimum Scale GB/s 3.281044&lt;BR /&gt;Average Scale GB/s 3.363702&lt;BR /&gt;Maximum Scale GB/s 3.391891&lt;BR /&gt;Minimum Add GB/s 3.368173&lt;BR /&gt;Average Add GB/s 3.418982&lt;BR /&gt;Maximum Add GB/s 3.486998&lt;BR /&gt;Minimum Triad GB/s 3.472837&lt;BR /&gt;Average Triad GB/s 3.526081&lt;BR /&gt;Maximum Triad GB/s 3.629542&lt;BR /&gt;Current time (1286616989) is Sat Oct 9 05:36:29 2010&lt;BR /&gt;End of StarSTREAM section.&lt;BR /&gt;Begin of SingleSTREAM section.&lt;BR /&gt;Node(s) with error 0&lt;BR /&gt;Node selected 74&lt;BR /&gt;Single STREAM Copy GB/s 8.495534&lt;BR /&gt;Single STREAM Scale GB/s 8.332597&lt;BR /&gt;Single STREAM Add GB/s 11.092103&lt;BR /&gt;Single STREAM Triad GB/s 11.051167&lt;BR /&gt;Current time (1286616996) is Sat Oct 9 05:36:36 2010&lt;BR /&gt;End of SingleSTREAM section.&lt;BR /&gt;Begin of MPIFFT section.&lt;BR /&gt;&lt;BR /&gt;The program hang here and no further output.&lt;BR /&gt;&lt;BR /&gt;Also my Make.em64t is as following:&lt;BR /&gt;#&lt;BR /&gt;SHELL = /bin/sh&lt;BR /&gt;#&lt;BR /&gt;CD = cd&lt;BR /&gt;CP = cp&lt;BR /&gt;LN_S = ln -s&lt;BR /&gt;MKDIR = mkdir&lt;BR /&gt;RM = /bin/rm -f&lt;BR /&gt;TOUCH = touch&lt;BR /&gt;#&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;# - Platform identifier ------------------------------------------------&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;#&lt;BR /&gt;ARCH = $(arch)&lt;BR /&gt;#&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;# - HPL Directory Structure / HPL library ------------------------------&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;#&lt;BR /&gt;TOPdir = ../../..&lt;BR /&gt;INCdir = $(TOPdir)/include&lt;BR /&gt;BINdir = $(TOPdir)/bin/$(ARCH)&lt;BR /&gt;LIBdir = $(TOPdir)/lib/$(ARCH)&lt;BR /&gt;#&lt;BR /&gt;HPLlib = $(LIBdir)/libhpl.a&lt;BR /&gt;#&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;# - Message Passing library (MPI) --------------------------------------&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;# MPinc tells the C compiler where to find the Message Passing library&lt;BR /&gt;# header files, MPlib is defined to be the name of the library to be&lt;BR /&gt;# used. The variable MPdir is only used for defining MPinc and MPlib.&lt;BR /&gt;#&lt;BR /&gt;MPdir = /opt/intel/impi/4.0.0.028&lt;BR /&gt;MPinc = -I$(MPdir)/include64&lt;BR /&gt;MPlib =&lt;BR /&gt;#&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;# - Linear Algebra library (BLAS or VSIPL) -----------------------------&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;# LAinc tells the C compiler where to find the Linear Algebra library&lt;BR /&gt;# header files, LAlib is defined to be the name of the library to be&lt;BR /&gt;# used. The variable LAdir is only used for defining LAinc and LAlib.&lt;BR /&gt;#&lt;BR /&gt;LAdir = /opt/intel/mkl/lib/em64t&lt;BR /&gt;LAinc = -I/opt/intel/mkl/include/fftw&lt;BR /&gt;LAlib = $(LAdir)/libfftw2x_cdft_DOUBLE_lp64.a $(LAdir)/libfftw2xc_intel.a -Wl,--start-group $(LAdir)/libmkl_intel_lp64.a $(LAdir)/libmkl_sequential.a $(LAdir)/libmkl_core.a $(LAdir)/libmkl_blacs_intelmpi_lp64.a $(LAdir)/libmkl_cdft_core.a -Wl, --end-group -lpthread&lt;BR /&gt;#&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;# - F77 / C interface --------------------------------------------------&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;# You can skip this section if and only if you are not planning to use&lt;BR /&gt;# a BLAS library featuring a Fortran 77 interface. Otherwise, it is&lt;BR /&gt;# necessary to fill out the F2CDEFS variable with the appropriate&lt;BR /&gt;# options. **One and only one** option should be chosen in **each** of&lt;BR /&gt;# the 3 following categories:&lt;BR /&gt;#&lt;BR /&gt;# 1) name space (How C calls a Fortran 77 routine)&lt;BR /&gt;#&lt;BR /&gt;# -DAdd_ : all lower case and a suffixed underscore (Suns,&lt;BR /&gt;# Intel, ...), [default]&lt;BR /&gt;# -DNoChange : all lower case (IBM RS6000),&lt;BR /&gt;# -DUpCase : all upper case (Cray),&lt;BR /&gt;# -DAdd__ : the FORTRAN compiler in use is f2c.&lt;BR /&gt;#&lt;BR /&gt;# 2) C and Fortran 77 integer mapping&lt;BR /&gt;#&lt;BR /&gt;# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]&lt;BR /&gt;# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,&lt;BR /&gt;# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.&lt;BR /&gt;#&lt;BR /&gt;# 3) Fortran 77 string handling&lt;BR /&gt;#&lt;BR /&gt;# -DStringSunStyle : The string address is passed at the string loca-&lt;BR /&gt;# tion on the stack, and the string length is then&lt;BR /&gt;# passed as an F77_INTEGER after all explicit&lt;BR /&gt;# stack arguments, [default]&lt;BR /&gt;# -DStringStructPtr : The address of a structure is passed by a&lt;BR /&gt;# Fortran 77 string, and the structure is of the&lt;BR /&gt;# form: struct {char *cp; F77_INTEGER len;},&lt;BR /&gt;# -DStringStructVal : A structure is passed by value for each Fortran&lt;BR /&gt;# 77 string, and the structure is of the form:&lt;BR /&gt;# struct {char *cp; F77_INTEGER len;},&lt;BR /&gt;# -DStringCrayStyle : Special option for Cray machines, which uses&lt;BR /&gt;# Cray fcd (fortran character descriptor) for&lt;BR /&gt;# interoperation.&lt;BR /&gt;#&lt;BR /&gt;F2CDEFS = -DF77_INTEGER=long -DUSING_FFTW -DMKL_INT=long -DLONG_IS_64BITS -DRA_SANDIA_OPT2 -DHPCC_FFT_235&lt;BR /&gt;#&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;# - HPL includes / libraries / specifics -------------------------------&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;#&lt;BR /&gt;HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)&lt;BR /&gt;HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)&lt;BR /&gt;#&lt;BR /&gt;# - Compile time options -----------------------------------------------&lt;BR /&gt;#&lt;BR /&gt;# -DHPL_COPY_L force the copy of the panel L before bcast;&lt;BR /&gt;# -DHPL_CALL_CBLAS call the cblas interface;&lt;BR /&gt;# -DHPL_CALL_VSIPL call the vsip library;&lt;BR /&gt;# -DHPL_DETAILED_TIMING enable detailed timers;&lt;BR /&gt;#&lt;BR /&gt;# By default HPL will:&lt;BR /&gt;# *) not copy L before broadcast,&lt;BR /&gt;# *) call the BLAS Fortran 77 interface,&lt;BR /&gt;# *) not display detailed timing information.&lt;BR /&gt;#&lt;BR /&gt;HPL_OPTS =&lt;BR /&gt;#&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;#&lt;BR /&gt;HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)&lt;BR /&gt;#&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;# - Compilers / linkers - Optimization flags ---------------------------&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;#&lt;BR /&gt;CC = mpiicc&lt;BR /&gt;CCNOOPT = $(HPL_DEFS)&lt;BR /&gt;CCFLAGS = $(HPL_DEFS) -O2 -xSSE4.2 -ip -ansi-alias -fno-alias&lt;BR /&gt;#&lt;BR /&gt;# On some platforms, it is necessary to use the Fortran linker to find&lt;BR /&gt;# the Fortran internals used in the BLAS library.&lt;BR /&gt;#&lt;BR /&gt;LINKER = mpiicc&lt;BR /&gt;LINKFLAGS = $(CCFLAGS)&lt;BR /&gt;#&lt;BR /&gt;ARCHIVER = ar&lt;BR /&gt;ARFLAGS = r&lt;BR /&gt;RANLIB = echo&lt;BR /&gt;#&lt;BR /&gt;# ----------------------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;Also -DF77_INTEGER=int have been tried.&lt;BR /&gt;&lt;BR /&gt;Thanks &lt;/P&gt;</description>
      <pubDate>Sat, 09 Oct 2010 14:42:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809710#M3750</guid>
      <dc:creator>xuzheng97</dc:creator>
      <dc:date>2010-10-09T14:42:13Z</dc:date>
    </item>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809711#M3751</link>
      <description>&lt;P&gt;By the way, I did not do fftw2xc_patch.diff part because I am not using hpcc1.3.1.&lt;BR /&gt;Would this be the reason?&lt;/P&gt;</description>
      <pubDate>Sat, 09 Oct 2010 14:56:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809711#M3751</guid>
      <dc:creator>xuzheng97</dc:creator>
      <dc:date>2010-10-09T14:56:22Z</dc:date>
    </item>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809712#M3752</link>
      <description>I just tested fftw2xc_patch.diff part but it did not affect anything at all</description>
      <pubDate>Sat, 09 Oct 2010 16:02:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809712#M3752</guid>
      <dc:creator>xuzheng97</dc:creator>
      <dc:date>2010-10-09T16:02:07Z</dc:date>
    </item>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809713#M3753</link>
      <description>Kevin, what MKL version do you use?</description>
      <pubDate>Sun, 10 Oct 2010 07:18:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809713#M3753</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2010-10-10T07:18:22Z</dc:date>
    </item>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809714#M3754</link>
      <description>Gennady,&lt;BR /&gt;&lt;BR /&gt;I tried both mkl 10.2.5.035 and correspoing mkl in Intel Compiler Suite version 11.1.072 but all hang.&lt;BR /&gt;&lt;BR /&gt;Thanks</description>
      <pubDate>Mon, 11 Oct 2010 02:04:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809714#M3754</guid>
      <dc:creator>xuzheng97</dc:creator>
      <dc:date>2010-10-11T02:04:29Z</dc:date>
    </item>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809715#M3755</link>
      <description>Hi Kevin,&lt;BR /&gt;&lt;BR /&gt;Increasing the core count (with the corresponding increases of parameter N) results in a longer vector used for MPI FFT. As you may have already noticed HPCC is not very well suited for lengthes bigger than MAX_INT.&lt;BR /&gt;&lt;BR /&gt;My first guess is that you are crossing this bound going from 96 to 192 cores.&lt;BR /&gt;&lt;BR /&gt;Please provide the following info:&lt;BR /&gt;- how you built the MPI FFTW wrappers&lt;BR /&gt;- compile line for file mpifft.c&lt;BR /&gt;- link line for the hpcc exe&lt;BR /&gt;- mpiexec line&lt;BR /&gt;in order for me to give you a better pice of advice.&lt;BR /&gt;&lt;BR /&gt;BTW, are you setting OMP_NUM_THREADS to 1? This is what you have to do unless you have your hpcc exe compiled and linked with -mt_mpi (supposing you are using Intel MPI)&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;-Vladimir</description>
      <pubDate>Mon, 11 Oct 2010 18:07:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809715#M3755</guid>
      <dc:creator>Vladimir_Petrov__Int</dc:creator>
      <dc:date>2010-10-11T18:07:33Z</dc:date>
    </item>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809716#M3756</link>
      <description>Vladimir,&lt;BR /&gt;&lt;BR /&gt;Thanks.&lt;BR /&gt;&lt;BR /&gt;Yes, large N indeed increase MPI FFT time. Bases on samll N experience, FFT time should be less than HPL's. I waited more than 3 hours while HPL only cost about 1 hour.&lt;BR /&gt;MAX_INT=2^32=2,147,483,648?&lt;BR /&gt;For 96 cores MPIFFT N=2,654,208,000, for 192 cores MPIFFT N=5,374,771,200.&lt;BR /&gt;It seems both of themexceed MAX_INT while 96 cores passed the test.&lt;BR /&gt;&lt;BR /&gt;I have also tried -DF77_INTEGER=long, so I am not sure whether MAX_INT=2^64?&lt;BR /&gt;&lt;BR /&gt;But indeed I found Intel 768 cores resulton hpcc website with MPIFFTN=22,932,357,120.&lt;BR /&gt;And my parameters configurationwere all based on Intel's result.&lt;BR /&gt;&lt;BR /&gt;&lt;P&gt;&lt;STRONG&gt;- how you built the MPI FFTW wrappers&lt;/STRONG&gt;&lt;BR /&gt; make libem64t PRECISION=MKL_DOUBLE&lt;BR /&gt; Withor without fftw2xc_patch.diff were both tried.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;- compile line for file mpifft.c&lt;BR /&gt;&lt;BR /&gt;&lt;/STRONG&gt;mpiicc -o ../../../../FFT/mpifft.o -c ../../../../FFT/mpifft.c -I../../../../include -DUSING_FFTW -DMKL_INT=long -DLONG_IS_64BITS -DRA_SANDIA_OPT2 -DHPCC_FFT_235 -I../../../include -I../../../include/em64t -I/mydirectory/hpcc-1.4.1/mkl/include/fftw -I/opt/intel/impi/4.0.0.028/include64 -O2 -xSSE4.2 -ip -ansi-alias -fno-alias&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;- link line for the hpcc exe&lt;/STRONG&gt;&lt;BR /&gt;mpiicc -DUSING_FFTW -DMKL_INT=long -DLONG_IS_64BITS -DRA_SANDIA_OPT2 -DHPCC_FFT_235 -I../../../include -I../../../include/em64t -I/opt/intel/mkl/include/fftw -I/opt/intel/impi/4.0.0.028/include64 -O2 -xSSE4.2 -ip -ansi-alias -fno-alias -o ../../../../hpcc ../../../lib/em64t/libhpl.a /opt/intel/mkl/lib/em64t/libfftw2x_cdft_DOUBLE_lp64.a /opt/intel/mkl/lib/em64t/libfftw2xc_intel.a -Wl,--start-group /opt/intel/mkl/lib/em64t/libmkl_intel_lp64.a /opt/intel/mkl/lib/em64t/libmkl_sequential.a /opt/intel/mkl/lib/em64t/libmkl_core.a /opt/intel/mkl/lib/em64t/libmkl_blacs_intelmpi_lp64.a /opt/intel/mkl/lib/em64t/libmkl_cdft_core.a -Wl, --end-group -lpthread&lt;BR /&gt;&lt;BR /&gt;- &lt;STRONG&gt;mpiexec line&lt;/STRONG&gt;&lt;BR /&gt; mpiexec -perhost 12 -n 192 ./hpcc&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I did not set OMP_NUM_THREADS and also it seems thathpcc exe was compiled and linked with -mt_mpi.&lt;BR /&gt;I will try to set OMP_NUM_THREADS=1 soon and update it soon.&lt;BR /&gt;&lt;BR /&gt;Thanks &amp;amp; Best Regards&lt;/P&gt;</description>
      <pubDate>Tue, 12 Oct 2010 02:31:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809716#M3756</guid>
      <dc:creator>xuzheng97</dc:creator>
      <dc:date>2010-10-12T02:31:19Z</dc:date>
    </item>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809717#M3757</link>
      <description>Kevin,&lt;BR /&gt;&lt;BR /&gt;The problem is here &lt;BR /&gt;&lt;BR /&gt;&lt;B&gt;- how you built the MPI FFTW wrappers&lt;/B&gt;&lt;BR /&gt; make libem64t PRECISION=MKL_DOUBLE&lt;BR /&gt;&lt;BR /&gt;Please add "interface=ilp64" like this:&lt;BR /&gt;make libem64t PRECISION=MKL_DOUBLE interface=ilp64&lt;BR /&gt;which let's hpcc pass 64-bit int's to MKL.&lt;BR /&gt;&lt;BR /&gt;Of course this is a mistake in our knowledge base article. Thank you for locating it!&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;-Vladimir</description>
      <pubDate>Tue, 12 Oct 2010 03:45:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809717#M3757</guid>
      <dc:creator>Vladimir_Petrov__Int</dc:creator>
      <dc:date>2010-10-12T03:45:24Z</dc:date>
    </item>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809718#M3758</link>
      <description>Vladimir,&lt;BR /&gt;&lt;BR /&gt;I built MPI MKL FFTW library as you guided and turned correspoing lib from *lp.a to ilp.a.&lt;BR /&gt;&lt;BR /&gt;Itpassed MPIFFT section but exhausted all system memory during StarFFT section.&lt;BR /&gt;The output is following:&lt;BR /&gt;&lt;BR /&gt;&lt;P&gt;Begin of MPIFFT section.&lt;BR /&gt;Warning: problem size too large: 135000*192*192&lt;BR /&gt;Number of nodes: 192&lt;BR /&gt;Vector size: 4976640000&lt;BR /&gt;Generation time: 1.172&lt;BR /&gt;Tuning: 3.690&lt;BR /&gt;Computing: 9.326&lt;BR /&gt;Inverse FFT: 10.957&lt;BR /&gt;max(|x-x0|): 2.914e-15&lt;BR /&gt;Gflop/s: 85.944&lt;BR /&gt;Current time (1286865020) is Tue Oct 12 02:30:20 2010&lt;BR /&gt;End of MPIFFT section.&lt;BR /&gt;Begin of StarFFT section.&lt;/P&gt;&lt;BR /&gt;&lt;BR /&gt;Here the system has 16 nodes and each node has 2*X5670, 24G memory, 4x QDR IB.&lt;BR /&gt;It seems same configuration with Intel's on &lt;A href="http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=414" target="_blank"&gt;http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=414&lt;/A&gt;.&lt;BR /&gt;And I am using HPL N=200000, PTRANS N=100000 and NB=168 P=6 Q=32 which is even smaller than the configuration on the website.&lt;BR /&gt;&lt;BR /&gt;Would you help to give further help on this?&lt;BR /&gt;&lt;BR /&gt;Thanks &amp;amp; Best Regards</description>
      <pubDate>Tue, 12 Oct 2010 07:01:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809718#M3758</guid>
      <dc:creator>xuzheng97</dc:creator>
      <dc:date>2010-10-12T07:01:17Z</dc:date>
    </item>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809719#M3759</link>
      <description>Kevin,&lt;BR /&gt;&lt;BR /&gt;It's good to here that the MPIFFT section passes now.&lt;BR /&gt;&lt;BR /&gt;As to the StarFFT section, unfortunately large memory consumption is a known problem of older versions of MKL.&lt;BR /&gt;It is fixed in version 10.3.0.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;-Vladimir</description>
      <pubDate>Tue, 12 Oct 2010 09:45:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809719#M3759</guid>
      <dc:creator>Vladimir_Petrov__Int</dc:creator>
      <dc:date>2010-10-12T09:45:25Z</dc:date>
    </item>
    <item>
      <title>Problem with Hpcc, MPIFFT hangs at large scale</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809720#M3760</link>
      <description>Vladimir,&lt;BR /&gt;&lt;BR /&gt;Oh, I am using MKL in Intel Compiler Suite 11.1.072.&lt;BR /&gt;&lt;BR /&gt;I will try 10.3 soon.&lt;BR /&gt;&lt;BR /&gt;Thanks for your kind help.&lt;BR /&gt;&lt;BR /&gt;Best Regards</description>
      <pubDate>Tue, 12 Oct 2010 11:19:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Problem-with-Hpcc-MPIFFT-hangs-at-large-scale/m-p/809720#M3760</guid>
      <dc:creator>xuzheng97</dc:creator>
      <dc:date>2010-10-12T11:19:37Z</dc:date>
    </item>
  </channel>
</rss>

