Hi

jaeks · ‎11-22-2010

Hello,

I am writing a small fortran90 code to do (amongst othet things)
memory-distributed matrix matrix multiplication A*B on a computational cluster.

For this, I use ScaLAPACK + BLACS .. in the MKL libraries.
The code works fast and fine for matrices <2GB (<16384x16384)
When I try to use the 64-bit libraries something goes wrong.

The code compiles just fine with

MKLPATH = /site/VERSIONS/intel-11.1u7/mkl/lib/em64t

$(MKLPATH)/libmkl_scalapack_ilp64.a -Wl,--start-group $(MKLPATH)/libmkl_inte
l_ilp64.a $(MKLPATH)/libmkl_sequential.a $(MKLPATH)/libmkl_core.a $(MKLPATH)/libmkl_blacs_op
enmpi_ilp64.a -Wl,--end-group

and the ifort mpi compile wrapper

mpif90

with the extra flags

-mcmodel=medium -i-dynamic -i8

During runtime I get an error from the pdgemr2d routine:

>>> xxmr2d:out of memory

Naturally, the same happens if I use the pdgeadd routine for
the task of block-cyclic distribution.

When browsing other forums I found this solution:
********************************
You need to modify the file REDIST/SRC/pgemraux.c and change
void *
mr2d_malloc(n)
unsigned int n;
{

to
void *
mr2d_malloc(n)
unsigned long int n;
{
********************************

I.e, a large enough workspace isn't allocated...
However, this is the exact meaning of using ilp64 libraries i guess?!

Could someone point me in the right direction or perhaps let me know what I am
doing wrong? I would really appreciate the help :) Oh, the multiplication routine
pdgemm works fine for large (>2gb) matrices, so the program really uses the
64-bit libraries.

MORE DETAILS AND CODE excerpt:

! create the root-node context where the entire A and B matrices
! reside in memory, called gloA and gloB
call sl_init (rootNodeContext, 1, 1)
! prep the descriptors for A B and C
! the C descriptor is used later for moving
! the resulting C sub arrays back to the root node

if (Iam==0) then
nr_gloA_row = numroc( m, m, myrow, 0, nprow )
nr_gloB_row = numroc( k, k, myrow, 0, nprow )
nr_gloC_row = numroc( m, m, myrow, 0, nprow )
call descinit( desc_gloA, m, k, m, k, 0, 0, &
rootNodeContext, max(1, nr_gloA_row), info)
call descinit( desc_gloB, k, n, k, n, 0, 0, &
rootNodeContext, max(1, nr_gloB_row), info)
call descinit (desc_gloC, m, n, m, n, 0, 0, &
rootNodeContext, max(1, nr_gloC_row), info)

else
desc_gloA(1:9) = 0
desc_gloB(1:9) = 0
desc_gloC(1:9) = 0
desc_gloA(2) = -1
desc_gloB(2) = -1
desc_gloC(2) = -1
end if

call pdgemr2d( m, k, gloA, one, one, desc_gloA, locA, &
one, one, desc_locA, desc_locA( 2 ))
call pdgemr2d( k, n, gloB, one, one, desc_gloB, locB, &
one, one, desc_locB, desc_locB( 2 ))

All the best,

Andreas

Gennady_F_Intel · ‎11-23-2010

Andreas,

it looks like the defect in iLP64 implementation. We will check and let you know.

--Gennady

jaeks · ‎12-09-2010

Hi Gennady,

I was just wondering if you managed to reproduce my problem, and if so, did you find a fix to it?!

All the best,

Andreas

Andrei_Moskalev__Int · ‎02-08-2011

Andreas,

I could not reproduce the described problem. All works fine with 17000x17000. Could you please provide test case?

Best,
andrew

Thomas_K_6 · ‎04-28-2014

which mpi library is this ?

32 bit integer or 64 bit integer ? OpenMPI, IntelMPI, MPICH2?

it sounds to be related to "Out of memory error with Cpzgemr2d" topic 509048

Best Regards

Thomas Kjaergaard

Thomas_K_6 · ‎01-29-2015

Hi

still have problems distributing a 43496x43496 matrix to the slaves using pdgemr2d it looks to be related to

http://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=491

and

https://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=465

will this be fixed soon?

TK

xxmr2d:out of memory even with 64-bit libraries