- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everybody, I am trying to distribute over a BLACS grid a square complex (double precision) matrix making use of the Cpzgemr2d function. This global matrix resides on a single "master" node and I need to distribute it over my grid as preliminary step of a linear system solution calculation. Everything works fine if I try to run my code with matrix of size of about 2GB or smaller, making use of various BLACS grids (2x2,3x3,4x4, etc...) with row/column block size equal to 64. However, increasing the matrix size for example to 4 GB, for the same grid and data block configuration, I receive an "xxmr2d:out of memory" error message after a call to Cpzgemr2d, which is not able to complete the matrix distribution procedure. Memory availability on my test machine is quite large (more than 40 GB per MPI task) and I am only allocating the mermory required for global and local matrix data blocks, so I guess that memory itself should not be the cause of my problem.
Is there any size limit for the above data distribution procedure? My test machine is a Linux cluster and I am working with MKL Scalapack 64 bit library.
Many thanks in advance for your help!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Christopher,
I have experienced the same issue, still not found a solution or workaround. Hope intel guys can help us!
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Christopher and Massimiliano B
I have the same problem when I use pdgemr2d. Hope intel guys can help us!
When I compile everything with 64 bit integer and link to a 64 bit integer MPI library this is STILL a problem. Normally I can also compile with 64 bit integer and link to a 32 bit integer MPI library (using some wrappers for all MPI calls) but since scalapack/blacs calls MPI directly this is not a possible solution.
Best Regards
Thomas Kjaergaard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Thomas K. and Massimiliano B.,
I have a question for you (nothing helpful for solving the problem, just out of curiosity…): have you tried to build by yourself the ScaLAPACK library by downloading the source code from netlib.org and link your code against that library instead of the MKL one? My problem is still present if I just make use of the downloaded code, however the observed error seems to be solved by the fix suggested here
https://software.intel.com/en-us/forums/topic/286499
namely:
********************************
You need to modify the file REDIST/SRC/pgemraux.c and change
void *
mr2d_malloc(n)
unsigned int n;
{
to
void *
mr2d_malloc(n)
unsigned long int n;
{
********************************
Best regards
Christopher
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No I have not tried to compile SCALAPACK myself, I just link to MKL.
Maybe I should, maybe I will need to but INTEL really should fix this problem!
TK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Thomas, it seems the problem on our side. the problem is escalated.
Can you give us the reproducer? it will help us to check the problem additionally.
regards, Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everybody,
I have written a main function making use of ScaLAPACK for the distribution of a square complex matrix, in order to reproduce the error described in my first post. I hope it could be useful for finding a solution. Some comments about the code:
1 - The USE_MKL_SCALAPACK variable allows to switch from the MKL to the open source ScaLAPACK library.
2 - In this code, a square complex matrix of size numberGlobalDataRows*numberGlobalDataRows is being built by the master task only (default BLACS grid is a 2x2 task grid). This global matrix is then distributed over the BLACS task grid making use of the Cpzgemr2d function: the out of memory error appears in correspondence to the call to this function when "large" global matrices are considered.
3- I have included some (conditionally compiled) code lines for printing to standard output the local
matrices owned by the single tasks after the distribution procedure, in order to check the distribution
result.
4 - I hope the code is clear enough (and without critical bugs)
Thank you for your support
Christopher
#include <iostream> #include "mpi.h" using namespace std; #ifdef USE_MKL_SCALAPACK #include "mkl_types.h" typedef MKL_INT int_type; typedef MKL_Complex16 dcomplex_type; #else struct dcomplex { double r; double i; }; typedef int int_type; typedef dcomplex dcomplex_type; #endif extern "C" { int numroc_(int_type* n, int_type* nb, int_type* iproc, int_type* isrcproc, int_type* nprocs); void descinit_(int_type* desca, int_type* m, int_type* n, int_type* mb, int_type* nb, int_type* irsrc, int_type* icsrc, int_type* context, int_type* llda, int_type* info); int_type Csys2blacs_handle(MPI_Comm i_mpiCommunicator); void Cblacs_gridexit(int_type context); void Cblacs_gridinfo(int_type context, int_type *nprow, int_type *npcol, int_type *myrow, int_type *mycol); void Cblacs_gridinit(int_type *context, char *order, int_type nprow, int_type npcol); void Cpzgemr2d(int_type m, int_type n, dcomplex_type* ptrmyblock, int_type ia, int_type ja, int_type* ma, dcomplex_type* ptrmynewblock, int_type ib, int_type jb, int_type* mb, int_type globcontext); } int_type c_zero(0); int_type c_one(1); char c_blacsGridDefaulOrdering('R'); ///////////////////////////////////////////////////////////////// // Matrix sizes int_type numberGlobalDataRows(15000); int_type numberGlobalDataColumns(numberGlobalDataRows); ///////////////////////////////////////////////////////////////// // BLACS grid structure int_type numberOfTasksPerRow(2); int_type numberOfTasksPerColumn(2); ///////////////////////////////////////////////////////////////// // Block cyclic distribution parameters int_type numberOfRowsInDataBlock(64); int_type numberOfColumnsInDataBlock(64); int main(int argc, char *argv[]) { ///////////////////////////////////////////////////////////////// // Initialize MPI environment int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&size); const bool isMasterTask(rank==0); ///////////////////////////////////////////////////////////////// // Retrieve global BLACS context for the provided communicator int_type blacsContext = Csys2blacs_handle(MPI_COMM_WORLD); ///////////////////////////////////////////////////////////////// // Create master-only MPI communicator MPI_Group masterTaskOnlyGroup(MPI_GROUP_NULL); MPI_Comm masterTaskOnlyCommunicator(MPI_COMM_NULL); MPI_Group mpiCommunicatorGroup; int mpiFunctionExitCode(MPI_Comm_group(MPI_COMM_WORLD, &mpiCommunicatorGroup)); if(mpiFunctionExitCode!=0) { throw; } int numberNewCommunicatorTasks(1); int newCommunicatorTaskIds[1] = {0}; mpiFunctionExitCode = MPI_Group_incl(mpiCommunicatorGroup, numberNewCommunicatorTasks, newCommunicatorTaskIds, &masterTaskOnlyGroup); if(mpiFunctionExitCode!=0) { throw; } mpiFunctionExitCode = MPI_Comm_create(MPI_COMM_WORLD, masterTaskOnlyGroup, &masterTaskOnlyCommunicator); if(mpiFunctionExitCode!=0) { throw; } ///////////////////////////////////////////////////////////////// // Retrieve BLACS context for the master task only communicator int_type masterBlacsContext(-1); if(isMasterTask) { masterBlacsContext = Csys2blacs_handle(masterTaskOnlyCommunicator); } ///////////////////////////////////////////////////////////////// // Set linear system parameters and allocate memory dcomplex_type* globalMatrix(NULL); if(isMasterTask) { globalMatrix = new dcomplex_type[numberGlobalDataRows*numberGlobalDataColumns]; } ///////////////////////////////////////////////////////////////// // Initialize master task only BLACS grid if(isMasterTask) { Cblacs_gridinit(&masterBlacsContext, &c_blacsGridDefaulOrdering, c_one, c_one); } ///////////////////////////////////////////////////////////////// // Set global matrix descriptor int_type globalMatrixDescription[9] = {0,-1,0,0,0,0,0,0,0}; int_type blacsFunctionExitCode(0); if(isMasterTask) { descinit_(globalMatrixDescription, &numberGlobalDataRows, &numberGlobalDataColumns, &numberGlobalDataRows, &numberGlobalDataColumns, &c_zero, &c_zero, &masterBlacsContext, &numberGlobalDataRows, &blacsFunctionExitCode); if(blacsFunctionExitCode!=0) { throw; } } ///////////////////////////////////////////////////////////////// // Fill global matrix if(isMasterTask) { unsigned int elementIndex(0); for(unsigned int rowIndex=0;rowIndex<numberGlobalDataRows;++rowIndex) { for(unsigned int columnIndex=0;columnIndex<numberGlobalDataColumns;++columnIndex,++elementIndex) { if(rowIndex == columnIndex) { #ifdef USE_MKL_SCALAPACK globalMatrix[elementIndex].real = 1.; globalMatrix[elementIndex].imag = 0.; #else globalMatrix[elementIndex].r = 1.; globalMatrix[elementIndex].i = 0.; #endif } else { #ifdef USE_MKL_SCALAPACK globalMatrix[elementIndex].real = 0.; globalMatrix[elementIndex].imag = 0.; #else globalMatrix[elementIndex].r = 0.; globalMatrix[elementIndex].i = 0.; #endif } } } } ///////////////////////////////////////////////////////////////// // Initialize BLACS task grid Cblacs_gridinit(&blacsContext, &c_blacsGridDefaulOrdering, numberOfTasksPerRow, numberOfTasksPerColumn); ///////////////////////////////////////////////////////////////// // Retrieve current task row and column index int_type taskRowIndex; int_type taskColumnIndex; Cblacs_gridinfo(blacsContext, &numberOfTasksPerRow, &numberOfTasksPerColumn, &taskRowIndex, &taskColumnIndex); ///////////////////////////////////////////////////////////////// // Retrieve local number of rows and columns according to block // cyclic data distribution and global problem size int_type numberLocalDataRows = numroc_(&numberGlobalDataRows, &numberOfRowsInDataBlock, &taskRowIndex, &c_zero, &numberOfTasksPerRow); int_type numberLocalDataColumns = numroc_(&numberGlobalDataColumns, &numberOfColumnsInDataBlock, &taskColumnIndex, &c_zero, &numberOfTasksPerColumn); ///////////////////////////////////////////////////////////////// // Allocate memory for local data containers dcomplex_type* localMatrix = new dcomplex_type[numberLocalDataRows*numberLocalDataColumns]; ///////////////////////////////////////////////////////////////// // Initialize local data descriptors int_type localMatrixDescription[9]; descinit_(localMatrixDescription, &numberGlobalDataRows, &numberGlobalDataColumns, &numberOfRowsInDataBlock, &numberOfColumnsInDataBlock, &c_zero, &c_zero, &blacsContext, &numberLocalDataRows, &blacsFunctionExitCode); if(blacsFunctionExitCode!=0) { throw; } ///////////////////////////////////////////////////////////////// // Distribute linear system blocks Cpzgemr2d(numberGlobalDataRows, numberGlobalDataColumns, globalMatrix, c_one, c_one, globalMatrixDescription, localMatrix, c_one, c_one, localMatrixDescription, blacsContext); #if 0 ///////////////////////////////////////////////////////////////// // Print local matrix for(unsigned int tRowIndex=0;tRowIndex<numberOfTasksPerRow;++tRowIndex) { for(unsigned int tColumnIndex=0;tColumnIndex<numberOfTasksPerColumn;++tColumnIndex) { if((tRowIndex == taskRowIndex) && (tColumnIndex==taskColumnIndex)) { unsigned int elementIndex(0); for(unsigned int rowIndex=0;rowIndex<numberLocalDataRows;++rowIndex) { for(unsigned int columnIndex=0;columnIndex<numberLocalDataColumns;++columnIndex,++elementIndex) { #ifdef USE_MKL_SCALAPACK cout<<"("<<localMatrix[elementIndex].real<<","<<localMatrix[elementIndex].imag<<") "; #else cout<<"("<<localMatrix[elementIndex].r<<","<<localMatrix[elementIndex].i<<") "; #endif } cout<<endl; } cout<<endl; } MPI_Barrier(MPI_COMM_WORLD); } } #endif ///////////////////////////////////////////////////////////////// // Release memory for local data containers delete [] localMatrix; ///////////////////////////////////////////////////////////////// // Free BLACS master only context if(masterBlacsContext != -1) { Cblacs_gridexit(masterBlacsContext); } ///////////////////////////////////////////////////////////////// // Release global matrix memory if(globalMatrix != 0) { delete [] globalMatrix; } ///////////////////////////////////////////////////////////////// // Free MPI resources if(masterTaskOnlyCommunicator!=MPI_COMM_NULL) { MPI_Comm_free(&masterTaskOnlyCommunicator); } if(masterTaskOnlyGroup!=MPI_GROUP_NULL) { MPI_Group_free(&masterTaskOnlyGroup); } ///////////////////////////////////////////////////////////////// // Free BLACS context Cblacs_gridexit(blacsContext); ///////////////////////////////////////////////////////////////// // Finalize MPI environment MPI_Finalize(); return 0; }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
please try to use the latest version of MKL - 11.2. Some changes have been done to eliminate such sort of problems.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page