Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

scalapack memory loss

mp_def
Beginner
559 Views

I have a code that solves large systems of equations in parallel using scalapack's PZGETRS function. The code fails for some cases due to an apparent lack of memory. I traced the memory loss to PZGETRS using the command "free -m" after every function call to monitor available memory on the compute nodes I'm using. For a matrix size of ~6000x6000, with a 7x7 process grid, I lose ~7 GB of available memory for every PZGETRS call. 

 

To fix this, I tried setting the env variable MKL_DISABLE_FAST_MM=1, calling mkl_disable_fast_mm() in the script, and compiling with different version of intel (I have access to 2020.1.217, 2019.5.281, and 2018.5.274). No change in the behavior. I also used the mkl_service module to try the mkl_free_buffers() command, and also attempted to measure peak memory using mkl_peak_mem_usage. The free_buffers didn't do anything, and the reported peak memory was ~11 MB. I'm having trouble reconciling that reported memory use with the apparent loss I see through the "free" command. I'm hoping this is an error either in my use of scalapack or my compilation, but if it is, I cannot figure it out.

 

I recreated the issue with a small test script, attached. I tested the script on my desktop, where I use openmpi and a local version of scalapack. For a matrix of size 6200, with 16 tasks (4x4 grid), my local code appears to lose 9 MB. On the cluster I'm using, where I compiled with impi and intel mkl, I lose 3648 MB with 16 tasks, and 7297 MB with 49 tasks.

 

For the attached Makefiles for my working example code, I renamed them to be .txt just to upload. Remove the .txt from both to make them run properly. The Makefile shows both ways I compile the code. I switch between them using the compile_link variable in Makefile.inc. Option 2 is my local install, option 3 is for the cluster using intel.

 

Labels (2)
0 Kudos
7 Replies
SantoshY_Intel
Moderator
526 Views

Hi,

 

Thanks for posting in the Intel forums.

 

>>"I tested the script on my desktop, where I use openmpi and a local version of scalapack. "

Could you please confirm if you are using an open-source version of ScaLAPACK which is not a part of Intel MKL?

 

Could you please try the combination of Intel MKL & OpenMPI and let us know your observations? i.e How much memory is being used using this combination?

 

Thanks & Regards,

Santosh

 

mp_def
Beginner
514 Views

On my desktop scalapack is open source version 2.0.0. 

 

On the cluster I tried OpenMPI + MKL:

  • 16 tasks --> 141 MB lost in solve
  • 49 tasks --> 211 MB lost in solve

 

The openMPI in both cases is 3.0.0

SantoshY_Intel
Moderator
286 Views

Hi,


Thanks for providing the details.


Could you please try the combination of opensource scaLAPACK & Intel MPI and let us know the memory being used?


Thanks & Regards,

Santosh


mp_def
Beginner
273 Views

I compiled with Intel MPI and opensource scalapack. Specifically, I recompiled my scalapack software using the impi compilers on the cluster. I also linked the scalapack build with the blas, lapack contained in MKL.

  • 16 tasks --> 3624 MB lost
  • 49 tasks --> 7341 MB lost

To eliminate all MKL, I recompiled scalapack using the blas/lapack in openblas (an older version, 0.2.20). To be clear, I compiled openblas using gcc/gfortran. I did this because of a little note in an openblas file to not use intel compilers. I compiled Scalapack using intel compilers.

  • 16 tasks --> 2118 MB lost
  • 49 tasks --> 3765 MB lost.

I then upgraded my openblas to v0.3.21.dev, and decided to compile openblas with intel compilers despite the message saying not to.

  • 16 tasks --> 2116 MB lost in solve
  • 49 tasks --> 3745 MB lost in solve.
SantoshY_Intel
Moderator
207 Views

Hi,


Thanks for providing all the details.


We are working on your issue & we will get back to you soon.


Thanks & Regards,

Santosh


SantoshY_Intel
Moderator
141 Views

Hi,

 

I tried to reproduce your issue from my end and below are my observations:

OpenMPI + MKL :

Steps:  

  1. mpif90 -c working_ex.f90 -o bin/working_ex.o
  2. mpif90 -g -fopenmp -Oo -debug bin/working_ex.o -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl -o main
  3. mpirun -n 1 ./main

Observations: Getting an error as shown in the attachment(OpenMPI.debug)

 

Intel MPI + MKL :

Steps:

  1.  source /opt/intel/oneapi/mpi/latest/env/vars.sh
  2.  source /opt/intel/oneapi/compiler/latest/env/vars.sh
  3.  mpiifort -c working_ex.f90 -o bin/working_ex.o
  4.  mpiifort -O0 -g -debug -qmkl=cluster -recursive bin/working_ex.o -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl -o main
  5. mpirun -n 16 ./main

Observations:

MicrosoftTeams-image (11).png

 

Could you please let me know if there is anything that I missed or went wrong while trying the OpenMPI+MKL combination?

 

Thanks,

Santosh

 

SantoshY_Intel
Moderator
51 Views

Hi,


Could you please let me know if there is anything that I missed or went wrong while trying the OpenMPI+MKL combination?


Thanks,

Santosh



Reply