Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6956 Discussions

PARDISO randomly crashes if limit the memory usage

sun__shuzhan
Beginner
794 Views

Hi,

 

If I run the compiled bin exe at head node without limiting the num of CPUs and amount of memory, PARDISO can run through and give correct results at Linux server. Also, running at Windows system, the code always works. However, PARDISO randomly crashes if I limit the memory usage with qsub at Linux server (see runCWNLAT.sub below). The crashes happed without error sign, so PARDISO will not return any error information. Pls help me check what might be the reason?

 

In my code, I used C++ code to call some functions from Fortran code. In one execution, PARDISO will be called many times. PARDISO in called in Fortran code. The simple Fortran code calling PARDISO is:

! Init or set PARDISO parameters

        maxfct = 1

        mnum = 1

        mtype = 6                           ! complex and symmetric matrix

        phase = 13                          ! analysis, numerical factorization, solve, iterative refinement

        nrhs = n_recei + 1                  ! number of right-hand sides that need to be solved for

        msglvl = 0                          ! if msglvl=1, print statistical information

        error = 0

        call pardisoinit(pt, mtype, iparm)  ! init pardiso with default parameters in accordance with the matrix type

        iparm(4) = 0                        ! no iterative solver, use direct algorithm

        iparm(28) = 0                       ! use type double precision "double complex" instead of "complex"

        iparm(35) = 0                       ! one-based indexing (Fortran-style indexing)

! Solve A*u = f with mkl PARDISO

        call pardiso(pt, maxfct, mnum, mtype, phase, &

            & n_totNodes, &                         ! num of rows of A, ~ num of equations in A*u = f

            & csrA_vals, csrA_rows, csrA_cols, &    ! CSR3 A

            & perm, nrhs, iparm, msglvl, &

            & f, &                                  ! right-hand side vector/matrix

            & u, &                                  ! solution vector/matrix

            & error)

        if (error /= 0) then

            write(6,*) 'ERROR during PARDISO backslash! Error = ', error

            stop "*** ERROR during PARDISO backslash! ***"

        endif

        phase = -1

        call pardiso(pt, maxfct, mnum, mtype, phase, n_totNodes, dummy, csrA_rows, csrA_cols, perm, nrhs, iparm, msglvl, dummy, dummy, error)

       

The matrix A above is sparse complex symmetric matrix. A’s number of nonzeros is about 1 million to 2 million. I tested the peak memory usage during the execution is about 3 GB, but PARDISO crashes even though I assign 16 GB memory at the server.

 

In makefile, I first compile C++ or Fortran source code to object, then link them together. Here are the details:

 

  • Operating system and version

-bash-4.1$ lsb_release -a

LSB Version:    :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch

Distributor ID: RedHatEnterpriseServer

Description:    Red Hat Enterprise Linux Server release 6.3 (Santiago)

Release:        6.3

Codename:       Santiago

 

  • Library version: mkl_compser_xe_2013

-bash-4.1$ echo $MKLROOT

/usr/opt/intel/composer_xe_2013.1.117/mkl

 

  • Compiler version

-bash-4.1$ ifort -v

ifort version 13.0.1

 

  • GNU Compiler Collection (GCC)* or Microsoft Visual Studio* version (if applicable)

-bash-4.1$ g++ -v

Using built-in specs.

COLLECT_GCC=g++

COLLECT_LTO_WRAPPER=/sb/gcc-5.2.0/libexec/gcc/x86_64-unknown-linux-gnu/5.2.0/lto-wrapper

Target: x86_64-unknown-linux-gnu

Configured with: /sb/objdir/../gcc-5.2.0/configure --prefix=/sb/gcc-5.2.0 --enable-languages=c,c++,fortran,go --disable-multilib

Thread model: posix

gcc version 5.2.0 (GCC)

 

  • Steps to reproduce the error (include makefiles, command lines, small test cases, and build instructions)

Makefile:

 

SRCFDIR = $(realpath ./)/src_Fortran

SRCCDIR = $(realpath ./)/src_Cpp_cw5

OBJDIR = $(realpath ./)/obj

MKDIR = if [ ! -d $(@D) ]; then mkdir -p $(@D); fi

 

PROGRAM=cw5

ARCH = $(shell uname -m)

TARGET = ${PROGRAM}.${ARCH}

#include Makefile.${ARCH}

 

CPPC    = g++

FC_SEQ  = ifort

FC_PAR  = ifort

FC_LINK = ifort

 

MKL_LINK_FLAGS =-L$(MKLROOT)/lib/intel64  -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -openmp -lpthread -lm

 

#Begin Optimized options

CPP_FLAGS    = -std=c++17  -mcmodel=large -w

F_SEQ_FLAGS  = -O3 -shared-intel

F_PAR_FLAGS  = -O3 -shared-intel

F_LINK_FLAGS = -O3 -static-intel -cxxlib -lrt

#End Optimized options

 

 

C_MAIN          = cw5.cpp

F_SRCS          = dcwnlatg4.f

F90_SRCS        = dcwnlatf4.f90

OBJS    = $(OBJDIR)/${C_MAIN:.cpp=.o} $(OBJDIR)/${F_SRCS:.f=.o} $(OBJDIR)/${F90_SRCS:.f90=.o}

 

 

all: ${TARGET} CWNLAT

 

# ********* First Program: cw5 ************ #

 

#${TARGET}: ${OBJS}

#       $(FC_LINK) -o $@ $(F_LINK_FLAGS) ${OBJS}

 

${TARGET}: ${OBJS}

        $(FC_LINK) $(F_LINK_FLAGS) -nofor_main -o $@ ${OBJS} $(MKL_LINK_FLAGS)

 

$(OBJDIR)/dcwnlatg4.o : $(SRCFDIR)/${F_SRCS}

        @$(MKDIR)

        $(FC_PAR) $(F_PAR_FLAGS) -o $(OBJDIR)/dcwnlatg4.o -c $(SRCFDIR)/${F_SRCS}

 

$(OBJDIR)/dcwnlatf4.o : $(SRCFDIR)/${F90_SRCS}

        @$(MKDIR)

        $(FC_PAR) $(F_PAR_FLAGS) -o $(OBJDIR)/dcwnlatf4.o -c $(SRCFDIR)/${F90_SRCS}

 

$(OBJDIR)/cw5.o : $(SRCCDIR)/${C_MAIN}

        @$(MKDIR)

        $(CPPC) $(CPP_FLAGS) -o $(OBJDIR)/cw5.o -c $(SRCCDIR)/${C_MAIN} -lrt

 

 

# ********* Second Program CWNLAT ************ #

 

CWNLAT : $(SRCFDIR)/runCWNLAT.f

        $(FC_SEQ) $(F_SEQ_FLAGS) -o CWNLAT $(SRCFDIR)/runCWNLAT.f

 

 

.PHONY: clean cleanall

 

clean:

        rm $(OBJS) CWNLAT

 

cleanall:

        rm $(OBJS) *~

 

 

 

runCWNLAT.sub used for qsub:

# Tell PBS which shell to use on the compute nodes Options are: /bin/bash or /bin/tcsh

#PBS -S /bin/bash

 

# Tell PBS the name to use for your job

#PBS -N runCWNLAT

 

# request #nodes:#cpus/node:#memory/node,requested time

#PBS -l select=1:ncpus=8:mem=16gb,walltime=00:04:00

 

# queue group

#PBS -q normal

 

# Tell PBS to join the output (.o) and error (.e) files into one file

#PBS -j oe

 

# *********** Commands **********#

# Tell PBS to run the job in the directory your job was submitted from

cd $PBS_O_WORKDIR

 

# Set up env for Intel MKL

source /opt/intel/bin/ifortvars.sh intel64

 

./CWNLAT ENmodel_33.DAT

 

0 Kudos
5 Replies
Gennady_F_Intel
Moderator
794 Views

>> 1 million to 2 million. I tested the peak memory usage during the execution is about 3 GB, but PARDISO crashes even though I assign 16 GB memory at the server.

You are trying to solve a pretty big problem and 16 Gb is not enough to allocate solver's factors. Please check what iparm[16] will return when you run the code w/o memory limitations..... and then adjust your memory settings by iparm[16] + iparm[14] values

 

0 Kudos
Gennady_F_Intel
Moderator
794 Views

and one more note - it would much better if you will share the sample of such long code as an attachment. It would much easier to read the message.

0 Kudos
sun__shuzhan
Beginner
794 Views

Hi Gennady,

 

Thank you for your reply!

 

I have tested the memory output from iparm[16] and [14], they gave about 0.5 GB memory. I have used function "getrusage" to detect the memory and the peak memory is still < 3 GB. 

 

Later, I found the crash is from a known bug at old version of PARDISO, which is not fixed until MKL 2017 version. The bug is at the nested dissection algorithm from the METIS package. So, I avoid the bug by setting iparm[2] = 0 instead of default iparm[2] = 2, then all testings are OK now!

 

 

0 Kudos
Gennady_F_Intel
Moderator
794 Views

That's pretty strange for such big input problem sizes. What is the number of nnz which is returned by iparm[17]? 

and what is the total memory consumed by mkl pardiso: max(iparm[14], iparm[15]+iparm[16]) ?

 

0 Kudos
sun__shuzhan
Beginner
794 Views

The nnz of sparse matrix is about 1 million to 2 million, the return of max(iparm[14], iparm[15]+iparm[16])  is about 0.5 GB. It is smaller than actual one because I used multiple right-hand-side. Anyway, the problem is solved now and thank you for your response! 

0 Kudos
Reply