Memory needed for dense matrix factorization with ?getrf doubles...?

BURGI · ‎06-09-2021

We have been using the Intel MKL for many years to solve fully populated matrices.
Compared to older calculations (some years ago), we have noticed that within the last versions of the MKL the required working memory doubles during the LU factorization process, i.e., a copy of the matrix to be solved is probably created internally, which, however, is actually not absolutely necessary for the factorization.
Accordingly, only problems with half the number of unknowns can be solved on a given system.

Does anyone know a solution or an option how this behavior can be turned off?

Thank you for your support, Ralf

MRajesh_intel · ‎06-10-2021

Hi,

Can you please share a minimal reproducer along with OS details, MKL version?

Regards

Rajesh.

BURGI · ‎06-15-2021

I am currently using Intel MKL versions 2020.0.0 and oneAPI version 2021.2.0 on Windows 10 Professional with MS VisualStudio 2010 and 2019.
I wrote a small test program that loads a previously stored matrix (complex float, order 26528, approx. 5.3 GB) in an aligned memory area provided with mkl_alloc and factorize this matrix using the function LAPACKE_cgetrf.
Immediately after calling the function, the used main memory increases by another 5.3 GB and is released after the factorization is completed.
Older calculations, which were carried out with the Intel MKL 11.2.0 (07/2014) under LINUX, did not need this additional main memory according to the old protocols.
Proof of this is also the fact that the matrices used at that time required about 200 GB of main memory and that the factorization and solution was possible on a workstation with a total of 256 GB without swapping.
I am now trying to get access to a LINUX system to check how it behaves with a current MKL now.

MRajesh_intel · ‎06-16-2021

Hi,

>>I wrote a small test program that loads a previously stored matrix

Can you please share the program, so that we can check it on our side.

Regards

Rajesh.

BURGI · ‎06-16-2021

Hi Rajesh,

I am using the following C++ code:

// SolveTest.cpp
//

#include <stdint.h>
#include <stdlib.h>
#include <float.h>
#include <stdio.h>
#include <math.h>
#include <complex>
#include <Windows.h>

using namespace std; // for <complex>

#define MKL_ILP64
#include "mkl.h"

#define ALIGN_VALUE	64

int main(int argc, char* argv[])
{
	FILE* pFile = NULL;
	DWORD dwOrder, dwError, dwTicks;
	size_t sTotal, sRightSide, sPivot;
	char* pszFileName;
	complex<float>* pMatrix, *pRightSide;
	MKL_INT Order, Info;
	MKL_INT* pPivot;
	MKLVersion ver;

	printf("SolveTest\n");
	if (argc < 2) {
		printf("- no filename defined\n");
		return(1);
	}
	pszFileName = argv[1];
	printf("- loading matrix file %s", pszFileName);
	dwError = fopen_s(&pFile, pszFileName, "rb");
	if (dwError != ERROR_SUCCESS) {
		printf("- error opening file, code %u\n", dwError);
		return(1);
	}
	printf(" - done\n");
	fread(&dwOrder, sizeof(DWORD), 1, pFile);		// order of the matrix
	fread(&sTotal, sizeof(size_t), 1, pFile);		// total size of matrix data
	printf("- complex matrix of order %u found\n", dwOrder);
	sRightSide = dwOrder * sizeof(complex<float>);
	printf("- allocating memory for main matrix (%llu MB)", sTotal / (1024*1024));
	pMatrix = (complex<float> *) mkl_malloc(sTotal, ALIGN_VALUE);
	if (!pMatrix) {
		dwError = GetLastError();
		fclose(pFile);
		printf("\n- no memory for main matrix, code %u\n", dwError);
		return(1);
	}
	printf(" - done\n");
	printf("- reading data");
	if (fread(pMatrix, sTotal, 1, pFile) != 1) {
		dwError = GetLastError();
		fclose(pFile);
		printf("\n- error reading data, code %u\n", dwError);
		return(1);
	}
	printf(" - done\n");
	fclose(pFile);
	printf("- allocating memory for rightside (%llu KB)", sRightSide / 1024);
	pRightSide = (complex<float> *) mkl_malloc(sRightSide, ALIGN_VALUE);
	if (!pRightSide) {
		dwError = GetLastError();
		mkl_free(pMatrix);
		printf("\n- no memory for rightside, code %u\n", dwError);
		return(1);
	}
	printf(" - done\n");
	sPivot = sizeof(MKL_INT) * dwOrder;
	printf("- allocating memory for pivot vector (%llu KB)", sPivot / 1024);
	pPivot = (MKL_INT *) mkl_malloc(sPivot, ALIGN_VALUE);
	if (!pPivot) {
		dwError = GetLastError();
		mkl_free(pMatrix);
		printf("\n- no memory for pivot vector, code %u\n", dwError);
		return(1);
	}
	printf(" - done\n");

// factorization part
	mkl_get_version(&ver);
	printf("\nusing MKL %u.%u.%u (%s)", ver.MajorVersion, ver.MinorVersion, ver.UpdateVersion, ver.Processor);
	printf("\n- starting factorization\n");
	dwTicks = GetTickCount();
	Order = dwOrder;
	Info = LAPACKE_cgetrf(LAPACK_ROW_MAJOR, Order, Order, (MKL_Complex8*) pMatrix, Order, pPivot); // at this point the additional memory usage begins
	dwTicks = GetTickCount() - dwTicks;
	printf("- factorization ended (%u ms)\n", dwTicks);

	mkl_free(pPivot);
	mkl_free(pRightSide);
	mkl_free(pMatrix);
	return(0);
}

Kind regards,

Ralf

Gennady_F_Intel · ‎06-18-2021

Ralf,

I am not exactly sure regarding version 11,2 you mentioned: our LAPACKE_cgetrf implementation is based on Netlib's ones, which makes an additional temporary memory allocation. You may check it follow the link:

http://www.netlib.org/lapack/explore-html/de/d2c/a01553_a289dd8ce852ba4df4ccdcde60ee6086d.html

-

Gennady

BURGI · ‎06-18-2021

Hello Gennady,

thank you for the link, this gave me the explanation of the problem.

The matrices for our "old" calculations were set up in LAPACK_COL_MAJOR format, this was then later changed to LAPACK_ROW_MAJOR for ease of reading and using other solution methods.

The code of lapacke_cgetrf_work.c now clearly shows that in this case additional memory is used during the factorization process for the required transformations using LAPACKE_cge_trans. The problem is therefore not due to the MKL version used.
Even if this additional memory is actually not mandatory for such a simple transformation, perhaps a short note in the documentation about this topic for *getrf would be helpful for other users.

I will test this within the next days by changing back my matrix structure to the COL_MAJOR format for test purposes.

With kind regards
Ralf

Gennady_F_Intel · ‎07-04-2021

The issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Memory needed for dense matrix factorization with ?getrf doubles...?

Performance