Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Massimiliano_B_1
Beginner
88 Views

Scalapack linear solver (memory problem)

Hi all,

I am trying to use Scalapack in order to solve a distributed linear system. The C++ source code is reported in attachment. The code compiles and runs without problems and it gives the right result. Then I tried to run the source code using valgrind (with environment variable set properly for an MPI app) in order to test memory management. Via Valgrind the execution crashes:

valgrind MPI wrappers  6704: Active for pid 6704
valgrind MPI wrappers  6704: Try MPIWRAP_DEBUG=help for possible options
valgrind MPI wrappers  6703: Active for pid 6703
valgrind MPI wrappers  6705: Active for pid 6705
valgrind MPI wrappers  6703: Try MPIWRAP_DEBUG=help for possible options
valgrind MPI wrappers  6705: Try MPIWRAP_DEBUG=help for possible options
valgrind MPI wrappers  6712: Active for pid 6712
valgrind MPI wrappers  6712: Try MPIWRAP_DEBUG=help for possible options
[giorgio-VirtualBox:6712] *** An error occurred in MPI_Type_get_envelope
[giorgio-VirtualBox:6712] *** on communicator MPI_COMM_WORLD
[giorgio-VirtualBox:6712] *** MPI_ERR_INTERN: internal error
[giorgio-VirtualBox:6712] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 3 with PID 6712 on
node giorgio-VirtualBox exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[giorgio-VirtualBox:06702] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[giorgio-VirtualBox:06702] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

I started to test using valgrind because I encountered memory problems in a more complex example where linear system matrix and linear system known term were constructed using various distributed algebra functions (pdgetri_, pfgemm_, pdgemv_) and other auxiliar linear systems solutions.

What is wrong? Thank you in advance for your help.

Massi

Compiler and linker: mpic++

Includes: /usr/lib/opnempi/include /opt/intel/composer_xe_2011_sp1.7.256/mkl/include

Link line:  -L$(MKLROOT)/lib/intel64 -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lmkl_blacs_openmpi_lp64 -lpthread -lm 

Compiler options: -DMKL_LP64

Environment variables: LD_LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.7.256/mkl/lib/intel64

 

0 Kudos
4 Replies
Massimiliano_B_1
Beginner
88 Views

Added attachment

Zhang_Z_Intel
Employee
88 Views

Hi, the Valgrind output seems to indicate the error occurred inside an MPI routine. Which vendor's MPI do you use. Is it Intel MPI?

Massimiliano_B_1
Beginner
88 Views

Hi Zhang,

Thank you for your reply. I don't use Intel MPI but OpenMPI 1.4.3. I solved the memory problems on the more complex source code that I cited in the first post (It was my fault in the input of pdgetri_) but I can't still run the code using Valgrind without crash occurring.

Best regards,

Massi 

Zhang_Z_Intel
Employee
88 Views

Massi,

Can you try linking with Intel MPI (you can download a 30-day trial version from http://software.intel.com/en-us/intel-mpi-library)? Also, try to run the program using only 1 MPI rank. Does it run OK with Valgrind? What I'm suspecting is, this is a problem of OpenMPI and has nothing to do with MKL. By the way, you are still using Intel Composer XE 2011, which was released more than 2 years ago. You may consider update to Intel Composer XE 2013. Or, at least, update your MKL to the latest 11.1 version. A lot of improvements and bug fixes have been done over the last 2 years.

 

 

Reply