- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am writing a small fortran90 code to do (amongst othet things)
memory-distributed matrix matrix multiplication A*B on a computational cluster.
For this, I use ScaLAPACK + BLACS .. in the MKL libraries.
The code works fast and fine for matrices <2GB (<16384x16384)
When I try to use the 64-bit libraries something goes wrong.
The code compiles just fine with
MKLPATH = /site/VERSIONS/intel-11.1u7/mkl/lib/em64t
$(MKLPATH)/libmkl_scalapack_ilp64.a -Wl,--start-group $(MKLPATH)/libmkl_inte
l_ilp64.a $(MKLPATH)/libmkl_sequential.a $(MKLPATH)/libmkl_core.a $(MKLPATH)/libmkl_blacs_op
enmpi_ilp64.a -Wl,--end-group
and the ifort mpi compile wrapper
mpif90
with the extra flags
-mcmodel=medium -i-dynamic -i8
During runtime I get an error from the pdgemr2d routine:
>>> xxmr2d:out of memory
Naturally, the same happens if I use the pdgeadd routine for
the task of block-cyclic distribution.
When browsing other forums I found this solution:
********************************
You need to modify the file REDIST/SRC/pgemraux.c and change
void *
mr2d_malloc(n)
unsigned int n;
{
to
void *
mr2d_malloc(n)
unsigned long int n;
{
********************************
I.e, a large enough workspace isn't allocated...
However, this is the exact meaning of using ilp64 libraries i guess?!
Could someone point me in the right direction or perhaps let me know what I am
doing wrong? I would really appreciate the help :) Oh, the multiplication routine
pdgemm works fine for large (>2gb) matrices, so the program really uses the
64-bit libraries.
MORE DETAILS AND CODE excerpt:
! create the root-node context where the entire A and B matrices
! reside in memory, called gloA and gloB
call sl_init (rootNodeContext, 1, 1)
! prep the descriptors for A B and C
! the C descriptor is used later for moving
! the resulting C sub arrays back to the root node
if (Iam==0) then
nr_gloA_row = numroc( m, m, myrow, 0, nprow )
nr_gloB_row = numroc( k, k, myrow, 0, nprow )
nr_gloC_row = numroc( m, m, myrow, 0, nprow )
call descinit( desc_gloA, m, k, m, k, 0, 0, &
rootNodeContext, max(1, nr_gloA_row), info)
call descinit( desc_gloB, k, n, k, n, 0, 0, &
rootNodeContext, max(1, nr_gloB_row), info)
call descinit (desc_gloC, m, n, m, n, 0, 0, &
rootNodeContext, max(1, nr_gloC_row), info)
else
desc_gloA(1:9) = 0
desc_gloB(1:9) = 0
desc_gloC(1:9) = 0
desc_gloA(2) = -1
desc_gloB(2) = -1
desc_gloC(2) = -1
end if
call pdgemr2d( m, k, gloA, one, one, desc_gloA, locA, &
one, one, desc_locA, desc_locA( 2 ))
call pdgemr2d( k, n, gloB, one, one, desc_gloB, locB, &
one, one, desc_locB, desc_locB( 2 ))
All the best,
Andreas
I am writing a small fortran90 code to do (amongst othet things)
memory-distributed matrix matrix multiplication A*B on a computational cluster.
For this, I use ScaLAPACK + BLACS .. in the MKL libraries.
The code works fast and fine for matrices <2GB (<16384x16384)
When I try to use the 64-bit libraries something goes wrong.
The code compiles just fine with
MKLPATH = /site/VERSIONS/intel-11.1u7/mkl/lib/em64t
$(MKLPATH)/libmkl_scalapack_ilp64.a -Wl,--start-group $(MKLPATH)/libmkl_inte
l_ilp64.a $(MKLPATH)/libmkl_sequential.a $(MKLPATH)/libmkl_core.a $(MKLPATH)/libmkl_blacs_op
enmpi_ilp64.a -Wl,--end-group
and the ifort mpi compile wrapper
mpif90
with the extra flags
-mcmodel=medium -i-dynamic -i8
During runtime I get an error from the pdgemr2d routine:
>>> xxmr2d:out of memory
Naturally, the same happens if I use the pdgeadd routine for
the task of block-cyclic distribution.
When browsing other forums I found this solution:
********************************
You need to modify the file REDIST/SRC/pgemraux.c and change
void *
mr2d_malloc(n)
unsigned int n;
{
to
void *
mr2d_malloc(n)
unsigned long int n;
{
********************************
I.e, a large enough workspace isn't allocated...
However, this is the exact meaning of using ilp64 libraries i guess?!
Could someone point me in the right direction or perhaps let me know what I am
doing wrong? I would really appreciate the help :) Oh, the multiplication routine
pdgemm works fine for large (>2gb) matrices, so the program really uses the
64-bit libraries.
MORE DETAILS AND CODE excerpt:
! create the root-node context where the entire A and B matrices
! reside in memory, called gloA and gloB
call sl_init (rootNodeContext, 1, 1)
! prep the descriptors for A B and C
! the C descriptor is used later for moving
! the resulting C sub arrays back to the root node
if (Iam==0) then
nr_gloA_row = numroc( m, m, myrow, 0, nprow )
nr_gloB_row = numroc( k, k, myrow, 0, nprow )
nr_gloC_row = numroc( m, m, myrow, 0, nprow )
call descinit( desc_gloA, m, k, m, k, 0, 0, &
rootNodeContext, max(1, nr_gloA_row), info)
call descinit( desc_gloB, k, n, k, n, 0, 0, &
rootNodeContext, max(1, nr_gloB_row), info)
call descinit (desc_gloC, m, n, m, n, 0, 0, &
rootNodeContext, max(1, nr_gloC_row), info)
else
desc_gloA(1:9) = 0
desc_gloB(1:9) = 0
desc_gloC(1:9) = 0
desc_gloA(2) = -1
desc_gloB(2) = -1
desc_gloC(2) = -1
end if
call pdgemr2d( m, k, gloA, one, one, desc_gloA, locA, &
one, one, desc_locA, desc_locA( 2 ))
call pdgemr2d( k, n, gloB, one, one, desc_gloB, locB, &
one, one, desc_locB, desc_locB( 2 ))
All the best,
Andreas
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andreas,
it looks like the defect in iLP64 implementation. We will check and let you know.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gennady,
I was just wondering if you managed to reproduce my problem, and if so, did you find a fix to it?!
All the best,
Andreas
I was just wondering if you managed to reproduce my problem, and if so, did you find a fix to it?!
All the best,
Andreas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andreas,
I could not reproduce the described problem. All works fine with 17000x17000. Could you please provide test case?
Best,
andrew
I could not reproduce the described problem. All works fine with 17000x17000. Could you please provide test case?
Best,
andrew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
which mpi library is this ?
32 bit integer or 64 bit integer ? OpenMPI, IntelMPI, MPICH2?
it sounds to be related to "Out of memory error with Cpzgemr2d" topic 509048
Best Regards
Thomas Kjaergaard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
still have problems distributing a 43496x43496 matrix to the slaves using pdgemr2d it looks to be related to
http://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=491
and
https://icl.cs.utk.edu/lapack-forum/viewtopic.php?t=465
will this be fixed soon?
TK
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page