- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a code that solves large systems of equations in parallel using scalapack's PZGETRS function. The code fails for some cases due to an apparent lack of memory. I traced the memory loss to PZGETRS using the command "free -m" after every function call to monitor available memory on the compute nodes I'm using. For a matrix size of ~6000x6000, with a 7x7 process grid, I lose ~7 GB of available memory for every PZGETRS call.
To fix this, I tried setting the env variable MKL_DISABLE_FAST_MM=1, calling mkl_disable_fast_mm() in the script, and compiling with different version of intel (I have access to 2020.1.217, 2019.5.281, and 2018.5.274). No change in the behavior. I also used the mkl_service module to try the mkl_free_buffers() command, and also attempted to measure peak memory using mkl_peak_mem_usage. The free_buffers didn't do anything, and the reported peak memory was ~11 MB. I'm having trouble reconciling that reported memory use with the apparent loss I see through the "free" command. I'm hoping this is an error either in my use of scalapack or my compilation, but if it is, I cannot figure it out.
I recreated the issue with a small test script, attached. I tested the script on my desktop, where I use openmpi and a local version of scalapack. For a matrix of size 6200, with 16 tasks (4x4 grid), my local code appears to lose 9 MB. On the cluster I'm using, where I compiled with impi and intel mkl, I lose 3648 MB with 16 tasks, and 7297 MB with 49 tasks.
For the attached Makefiles for my working example code, I renamed them to be .txt just to upload. Remove the .txt from both to make them run properly. The Makefile shows both ways I compile the code. I switch between them using the compile_link variable in Makefile.inc. Option 2 is my local install, option 3 is for the cluster using intel.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in the Intel forums.
>>"I tested the script on my desktop, where I use openmpi and a local version of scalapack. "
Could you please confirm if you are using an open-source version of ScaLAPACK which is not a part of Intel MKL?
Could you please try the combination of Intel MKL & OpenMPI and let us know your observations? i.e How much memory is being used using this combination?
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On my desktop scalapack is open source version 2.0.0.
On the cluster I tried OpenMPI + MKL:
- 16 tasks --> 141 MB lost in solve
- 49 tasks --> 211 MB lost in solve
The openMPI in both cases is 3.0.0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for providing the details.
Could you please try the combination of opensource scaLAPACK & Intel MPI and let us know the memory being used?
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I compiled with Intel MPI and opensource scalapack. Specifically, I recompiled my scalapack software using the impi compilers on the cluster. I also linked the scalapack build with the blas, lapack contained in MKL.
- 16 tasks --> 3624 MB lost
- 49 tasks --> 7341 MB lost
To eliminate all MKL, I recompiled scalapack using the blas/lapack in openblas (an older version, 0.2.20). To be clear, I compiled openblas using gcc/gfortran. I did this because of a little note in an openblas file to not use intel compilers. I compiled Scalapack using intel compilers.
- 16 tasks --> 2118 MB lost
- 49 tasks --> 3765 MB lost.
I then upgraded my openblas to v0.3.21.dev, and decided to compile openblas with intel compilers despite the message saying not to.
- 16 tasks --> 2116 MB lost in solve
- 49 tasks --> 3745 MB lost in solve.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for providing all the details.
We are working on your issue & we will get back to you soon.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tried to reproduce your issue from my end and below are my observations:
OpenMPI + MKL :
Steps:
- mpif90 -c working_ex.f90 -o bin/working_ex.o
- mpif90 -g -fopenmp -Oo -debug bin/working_ex.o -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl -o main
- mpirun -n 1 ./main
Observations: Getting an error as shown in the attachment(OpenMPI.debug)
Intel MPI + MKL :
Steps:
- source /opt/intel/oneapi/mpi/latest/env/vars.sh
- source /opt/intel/oneapi/compiler/latest/env/vars.sh
- mpiifort -c working_ex.f90 -o bin/working_ex.o
- mpiifort -O0 -g -debug -qmkl=cluster -recursive bin/working_ex.o -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl -o main
- mpirun -n 16 ./main
Observations:
Could you please let me know if there is anything that I missed or went wrong while trying the OpenMPI+MKL combination?
Thanks,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please let me know if there is anything that I missed or went wrong while trying the OpenMPI+MKL combination?
Thanks,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi. I’m looking at this and will get back to you soon.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>>"I’m looking at this and will get back to you soon."
Could you please provide us with an update on this issue?
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.
Thanks & Regards,
Santosh

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page