Hi Marcos,

Marcos_V_1 · ‎03-10-2020

Hello, I'll add here the information on a support ticket I started last month to check if if the community has come up with this issue. We use the MKL parallel cluster solver, together with Intel MPI for our HPC software (called FDS). The software has to solve thousands of times a Poisson equation using the MKL cluster solver solve phase. We have noted the memory being used increases as the MKL cluster solver is used, eventually leading to a catastrophic out of memory error in MPI.

I isolated the repeated use of the MKL cluster solver on a single standalone program completely separate from our software, and still see the memory use increase.

Try following the instructions on the README file in this tarball, to compile the code and run the case to see if your memory use increases (takes a few hours of runtime). I have verified this is the case in two Linux clusters with Centos 6 and 7 and Intel parallel studio versions from 2018, 2019 and last 2020.

I would really appreciate any help on this.

Marcos

Gennady_F_Intel · ‎03-11-2020

Marcos, how did you run this code?

Marcos_V_1 · ‎03-11-2020

Hi Gennady, thank you for taking interest! I used a submission script on both clusters fitting the 8 MPI processes in one node (one cluster has 8 physical cores per node and the other 12). This is the example (torque) for burn (12 core nodes):

#!/bin/bash
#PBS -N test_glmat
#PBS -W umask=0022
#PBS -e /home4/mnv/FIREMODELS_FORK/CLUSTER_SPARSE_SOLVER_TEST/test/test_glmat.err
#PBS -o /home4/mnv/FIREMODELS_FORK/CLUSTER_SPARSE_SOLVER_TEST/test/test_glmat.log
#PBS -l nodes=1:ppn=8
#PBS -l walltime=999:0:0
export MODULEPATH=/usr/local/Modules/versions:/usr/local/Modules/$MODULE_VERSION/modulefiles:/usr/local/Modules/modulefiles
module purge
module load null modules torque-maui intel/19u4
export OMP_NUM_THREADS=1
export I_MPI_DEBUG=5
cd /home4/mnv/FIREMODELS_FORK/CLUSTER_SPARSE_SOLVER_TEST/test
echo
echo `date`
echo "     Directory: `pwd`"
echo "          Host: `hostname`"
/opt/intel19/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/mpiexec   -np 8 /home4/mnv/FIREMODELS_FORK/CLUSTER_SPARSE_SOLVER_TEST/test/css_test

and here is the an example submisison for the test for blaze, our other cluster with 8 cores per node (SLURM):

#!/bin/bash
#SBATCH -J test_glmat
#SBATCH -e /home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST/test/test_glmat.err
#SBATCH -o /home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST/test/test_glmat.log
#SBATCH -p batch
#SBATCH -n 8
#SBATCH --cpus-per-task=1

#SBATCH -t 99-99:99:99
export OMP_NUM_THREADS=1
export I_MPI_DEBUG=5
cd /home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST/test
echo
echo `date`
echo "    Input file: test_glmat.fds"
echo "     Directory: `pwd`"
echo "          Host: `hostname`"
/opt/intel20/compilers_and_libraries_2020.0.166/linux/mpi/intel64/bin/mpiexec /home/mnv/FireModels_fork/CLUSTER_SPARSE_SOLVER_TEST/test/css_test

Executing

mpirun -n 8 YOUR_DIR/CLUSTER_SPARSE_SOLVER_TEST/test/css_test

on a single workstation should give same outcome. I'm trying to understand if there is any combination of memory flags/routine calls that would take care of this leak I'm seeing but haven't been successful.

BTW, this is how it crashes in both cases (what it writes to the .err file, or screen):

....

NSOLVES =       166800
NSOLVES =       166900
NSOLVES =       167000
NSOLVES =       167100
NSOLVES =       167200
NSOLVES =       167300
NSOLVES =       167400
NSOLVES =       167500
NSOLVES =       167600
NSOLVES =       167700
Abort(606162959) on node 6 (rank 6 in comm 0): Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(499).....: MPI_Comm_split(comm=0xc4000012, color=1, key=0, new_comm=0x7ffcc8948b30) failed
PMPI_Comm_split(481).....:
MPIR_Comm_split_impl(384):
MPIR_Comm_commit(598)....:
MPIR_Info_alloc(61)......: Out of memory (unable to allocate a 'MPI_Info')

Best Regards,

Marcos

Gennady_F_Intel · ‎03-12-2020

Marcos,

I made the short experiments ( 10K iterations and with MKL 2020 ) so far and see the size of the memory consumed by a program is the same. I used vmstat utility to track this process. We will run the whole benchmark ( 250K of iterations ), this will take significant time to run. I will keep this thread updated.

Marcos_V_1 · ‎03-13-2020

Hi Gennady, thank you very much for taking interest on this. Here is a snapshot of the memory use for this program in one of our clusters (the one with 12 core nodes). This graph is provided by ganglia, the cluster status application.

Best Regards,

Marcos

Kirill_V_Intel · ‎03-15-2020

Hello Marcos,

If possible, could you try instead of calling mkl_free_buffers in your solving loop, call PARDISO with phase = -1 and tell us if you still observe the memory leak? This could help our investigation I hope. If you want, you can call mkl_free_buffers, but only after the very last call to MKL routines (i.e., not inside the loop).

Thanks,
Kirill

Kirill_V_Intel · ‎03-16-2020

Hello again,

Sorry, I actually wanted to suggest simply removing the call to mkl_free_buffers. It'd be incorrect to plug in calls to phase = -1.

Marcos_V_1 · ‎03-16-2020

Hi Kirill, thank you for your interest. I've tried with and without the mkl_free_buffers call within the solve loop with the same outcome. It doesn't seem to make any difference. Now, about calling the cluster_sparse_solver with phase -1, wouldn't that get rid of the stored factorization matrix and I could not keep calling the solver phase within the loop?

My other question is, have you been able to reproduce the behavior?

Thank you,

Marcos

Gennady_F_Intel · ‎03-16-2020

I made such experiments and still see the same problem as Marcos reported :

.....................................

NSOLVES = 166000
NSOLVES = 167000
Abort(471945231) on node 6 (rank 6 in comm 0): Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(499).....: MPI_Comm_split(comm=0xc4000012, color=1, key=0, new_comm=0x7ffc47d65630) failed
PMPI_Comm_split(481).....:
MPIR_Comm_split_impl(384):
MPIR_Comm_commit(598)....:

Marcos_V_1 · ‎03-17-2020

Hi Gennady, thank you for checking this. It is interesting that the error happens at the same instance, even though we are running the case with different hardware.

Let's see if the issue is escalated.

Best Regards,

Marcos

Gennady_F_Intel · ‎03-17-2020

Hi Marcos, yes we escalated the issue against solver owners and will keep you informed.

Marcos_V_1 · ‎03-20-2020

Thank you Gennady, please let us know and stay safe.

Marcos

Gennady_F_Intel · ‎04-02-2020

Marcos,

Please check version 2020 update 1 - MKL and MPI

I checked the example you shared and see the test passed.

.....................................

NSOLVES = 249940
NSOLVES = 249950
NSOLVES = 249960
NSOLVES = 249970
NSOLVES = 249980
NSOLVES = 249990
NSOLVES = 250000
[gfedorov@cerberos u849887]$

Marcos_V_1 · ‎04-03-2020

Thank you Gennady, we will test 2020 update 1. I'll let you know if we get the same behavior.

Best Regards,

Marcos

Kirill_V_Intel · ‎04-03-2020

Hi Marcos,

Just adding to what Gennady said, for clarification: the issue (as far as our suggestion goes) is related to MPI and not the Cluster Sparse Solver. So, if for any reason you don't want to use a newer MKL, using a newer MPI should fix the problem already.

Best,
Kirill

Marcos_V_1 · ‎04-08-2020

Hi Gennady and Kirill, thank you for your help. I tested the sample case in one of our clusters with intel 2020 update 1 and it also passed.

We tried installing update 1 on another cluster that has Centos 6 and we are having library issues (glibc_2.14 is missing). It seems the latest suite will not work in Centos 6? is there a workaround for this?

Thank you very much,

Marcos

Kirill_V_Intel · ‎04-08-2020

Hi Marcos,

First, as e.g. this page says (https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2020-system-requirements) MKL 2020 and later officially supports Centos versions not older than 7.x. So, a good solution would be to upgrade the OSon the cluster nodes.

Second, you can try to update glibc and gnu-utils packages (or get a newer version locally) and see whether this fixes the problem. Here unfortunately I cannot give a more specific advice / workaround.

Best,
Kirill

Marcos_V_1 · ‎04-22-2020

Thank you Kirill, we will upgrade to Centos 7, once we are able to return to our physical workspace.

I am trying to make the mpi wrapper for mkl in my Mac workstation, which uses Mac OSX Catalina and Openmpi 4.0.2 (provided by Homebrew). When I execute the commant to make the custom blacs I get the result in the attached figure. It seems some variables on mklmpi have been deprecated in MPI 3.0?

Please let me know if I should start a different thread in the forum.

best Regards,

Marcos

Gennady_F_Intel · ‎04-22-2020

Marcos, in general starting the new thread, would be better to easier tracking the issues .... . Regarding to the MPI macros problem. It seem you use one of the latest versions of OpenMPi 4.0.2 which MKL doesn't validate at this moment. Here is the link to the mkl system requirements for your reference.

Here is the link to the Open MPI FAQ: https://www.open-mpi.org/faq/?category=mpi-removed#mpi-1-mpi-lb-ub where you see this problem has been discussed. We hope that helps.

Marcos_V_1 · ‎04-23-2020

Thank you Gennady, I will dial back the version of OpenMPI. Are there any plans for updating the macros for MPI versions in Mac OSX?

Best Regards,

Marcos

Gennady_F_Intel · ‎04-23-2020

certainly yes, but the question is when (:. Marcos, it would be great if you submit the feature request to the intel online service center to follow with this issue.

Memory Leak using many times the cluster sparse solver