 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
We have been using the Intel MKL for many years to solve fully populated matrices.
Compared to older calculations (some years ago), we have noticed that within the last versions of the MKL the required working memory doubles during the LU factorization process, i.e., a copy of the matrix to be solved is probably created internally, which, however, is actually not absolutely necessary for the factorization.
Accordingly, only problems with half the number of unknowns can be solved on a given system.
Does anyone know a solution or an option how this behavior can be turned off?
Thank you for your support, Ralf
 Tags:
 factorization
 Memory
Link Copied
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
Hi,
Can you please share a minimal reproducer along with OS details, MKL version?
Regards
Rajesh.
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
I am currently using Intel MKL versions 2020.0.0 and oneAPI version 2021.2.0 on Windows 10 Professional with MS VisualStudio 2010 and 2019.
I wrote a small test program that loads a previously stored matrix (complex float, order 26528, approx. 5.3 GB) in an aligned memory area provided with mkl_alloc and factorize this matrix using the function LAPACKE_cgetrf.
Immediately after calling the function, the used main memory increases by another 5.3 GB and is released after the factorization is completed.
Older calculations, which were carried out with the Intel MKL 11.2.0 (07/2014) under LINUX, did not need this additional main memory according to the old protocols.
Proof of this is also the fact that the matrices used at that time required about 200 GB of main memory and that the factorization and solution was possible on a workstation with a total of 256 GB without swapping.
I am now trying to get access to a LINUX system to check how it behaves with a current MKL now.
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
Hi,
>>I wrote a small test program that loads a previously stored matrix
Can you please share the program, so that we can check it on our side.
Regards
Rajesh.
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
Hi Rajesh,
I am using the following C++ code:
// SolveTest.cpp
//
#include <stdint.h>
#include <stdlib.h>
#include <float.h>
#include <stdio.h>
#include <math.h>
#include <complex>
#include <Windows.h>
using namespace std; // for <complex>
#define MKL_ILP64
#include "mkl.h"
#define ALIGN_VALUE 64
int main(int argc, char* argv[])
{
FILE* pFile = NULL;
DWORD dwOrder, dwError, dwTicks;
size_t sTotal, sRightSide, sPivot;
char* pszFileName;
complex<float>* pMatrix, *pRightSide;
MKL_INT Order, Info;
MKL_INT* pPivot;
MKLVersion ver;
printf("SolveTest\n");
if (argc < 2) {
printf(" no filename defined\n");
return(1);
}
pszFileName = argv[1];
printf(" loading matrix file %s", pszFileName);
dwError = fopen_s(&pFile, pszFileName, "rb");
if (dwError != ERROR_SUCCESS) {
printf(" error opening file, code %u\n", dwError);
return(1);
}
printf("  done\n");
fread(&dwOrder, sizeof(DWORD), 1, pFile); // order of the matrix
fread(&sTotal, sizeof(size_t), 1, pFile); // total size of matrix data
printf(" complex matrix of order %u found\n", dwOrder);
sRightSide = dwOrder * sizeof(complex<float>);
printf(" allocating memory for main matrix (%llu MB)", sTotal / (1024*1024));
pMatrix = (complex<float> *) mkl_malloc(sTotal, ALIGN_VALUE);
if (!pMatrix) {
dwError = GetLastError();
fclose(pFile);
printf("\n no memory for main matrix, code %u\n", dwError);
return(1);
}
printf("  done\n");
printf(" reading data");
if (fread(pMatrix, sTotal, 1, pFile) != 1) {
dwError = GetLastError();
fclose(pFile);
printf("\n error reading data, code %u\n", dwError);
return(1);
}
printf("  done\n");
fclose(pFile);
printf(" allocating memory for rightside (%llu KB)", sRightSide / 1024);
pRightSide = (complex<float> *) mkl_malloc(sRightSide, ALIGN_VALUE);
if (!pRightSide) {
dwError = GetLastError();
mkl_free(pMatrix);
printf("\n no memory for rightside, code %u\n", dwError);
return(1);
}
printf("  done\n");
sPivot = sizeof(MKL_INT) * dwOrder;
printf(" allocating memory for pivot vector (%llu KB)", sPivot / 1024);
pPivot = (MKL_INT *) mkl_malloc(sPivot, ALIGN_VALUE);
if (!pPivot) {
dwError = GetLastError();
mkl_free(pMatrix);
printf("\n no memory for pivot vector, code %u\n", dwError);
return(1);
}
printf("  done\n");
// factorization part
mkl_get_version(&ver);
printf("\nusing MKL %u.%u.%u (%s)", ver.MajorVersion, ver.MinorVersion, ver.UpdateVersion, ver.Processor);
printf("\n starting factorization\n");
dwTicks = GetTickCount();
Order = dwOrder;
Info = LAPACKE_cgetrf(LAPACK_ROW_MAJOR, Order, Order, (MKL_Complex8*) pMatrix, Order, pPivot); // at this point the additional memory usage begins
dwTicks = GetTickCount()  dwTicks;
printf(" factorization ended (%u ms)\n", dwTicks);
mkl_free(pPivot);
mkl_free(pRightSide);
mkl_free(pMatrix);
return(0);
}
Kind regards,
Ralf
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
Ralf,
I am not exactly sure regarding version 11,2 you mentioned: our LAPACKE_cgetrf implementation is based on Netlib's ones, which makes an additional temporary memory allocation. You may check it follow the link:
http://www.netlib.org/lapack/explorehtml/de/d2c/a01553_a289dd8ce852ba4df4ccdcde60ee6086d.html

Gennady
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
Hello Gennady,
thank you for the link, this gave me the explanation of the problem.
The matrices for our "old" calculations were set up in LAPACK_COL_MAJOR format, this was then later changed to LAPACK_ROW_MAJOR for ease of reading and using other solution methods.
The code of lapacke_cgetrf_work.c now clearly shows that in this case additional memory is used during the factorization process for the required transformations using LAPACKE_cge_trans. The problem is therefore not due to the MKL version used.
Even if this additional memory is actually not mandatory for such a simple transformation, perhaps a short note in the documentation about this topic for *getrf would be helpful for other users.
I will test this within the next days by changing back my matrix structure to the COL_MAJOR format for test purposes.
With kind regards
Ralf
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
The issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
 Subscribe to RSS Feed
 Mark Topic as New
 Mark Topic as Read
 Float this Topic for Current User
 Bookmark
 Subscribe
 Printer Friendly Page