Pardiso Low rank Update does not accelerate the decomposition

Limansky__Alexander · ‎06-05-2018

Hello, I have a problem: Pardiso Low rank Update does not accelerate the decomposition. The usual pardiso decomposition is faster than decomposition with low rank update.

I have a complex symmetric matrix with number of nonzeros in factor about 2 millions.

Only 400 elements of matrix was changed, structure of matrix wasn't changed.

Initialization:

	/* -------------------------------------------------------------------- */
	/* .. Setup Pardiso control parameters. */
	/* -------------------------------------------------------------------- */
	mtype = 6; /* Complex symmetric matrix */
	for (i = 0; i < 64; i++)
	{
		iparm = 0;
	}
	iparm[0] = 1; // No solver default */
				  //iparm[1] = 2; // Fill-in reordering from METIS
	iparm[1] = 2;   //parallel(OpenMP) version of the nested dissection algorithm

					// Numbers of processors, value of OMP_NUM_THREADS
	iparm[2] = 0;
	iparm[3] = 0; // No iterative-direct algorithm 
	iparm[4] = 0; // No user fill-in reducing permutation 
	iparm[5] = 0; // Write solution into x 
	iparm[6] = 0; // Not in use 
	iparm[7] = 0; // Max numbers of iterative refinement steps 
	iparm[8] = 0; // Not in use 
	iparm[9] = 8; // Perturb the pivot elements with 1E-8
	iparm[10] = 0; // Use nonsymmetric permutation and scaling MPS 
	iparm[11] = 0; // Not in use 
	iparm[12] = 0; // Maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm[12] = 1 in case of inappropriate accuracy 
	iparm[13] = 0; // Output: Number of perturbed pivots 
	iparm[14] = 0; // Not in use 
	iparm[15] = 0; // Not in use 
	iparm[16] = 0; // Not in use 
	iparm[17] = -1; // Output: Number of nonzeros in the factor LU 
	iparm[18] = -1; // Output: Mflops for LU factorization 
	iparm[19] = 0; // Output: Numbers of CG Iterations 
	iparm[20] = 1;   // Apply 1x1 and 2x2 Bunch and Kaufman pivoting during the factorization process
	iparm[23] = 10;  // PARDISO uses new two - level factorization algorithm
	iparm[24] = 2; //Parallel forward/backward solve control. Intel MKL PARDISO uses a parallel algorithm for the solve step.
	iparm[26] = 1;
	iparm[30] = 0; // Partial solution
	iparm[34] = 1; //zero-based index


	maxfct = 1; // Maximum number of numerical factorizations. 
	mnum = 1; // Which factorization to use. 
	msglvl = 0; // Print statistical information in file 
	error = 0; // Initialize error flag 
			   /* -------------------------------------------------------------------- */
			   /* .. Initialize the internal solver memory pointer. This is only */
			   /* necessary for the FIRST call of the PARDISO solver. */
			   /* -------------------------------------------------------------------- */
	for (i = 0; i < 64; i++)
	{
		pt = 0;
	}

	/* -------------------------------------------------------------------- */
	/* .. Reordering and Symbolic Factorization. This step also allocates */
	/* all memory that is necessary for the factorization. */
	/* -------------------------------------------------------------------- */

	phase = 11;
	PARDISO(pt, &maxfct, &mnum, &mtype, &phase,
		&nRows, complexValues, rowIndex, columns, &idum, &nRhs,
		iparm, &msglvl, &ddum, &ddum, &error);

	printf("\nReordering completed ...\n");
	printf("Number of nonzeros in factors = %d\n", iparm[17]);
	printf("Number of factorization MFLOPS = %d\n\n", iparm[18]);

	if (error != 0)
	{
		printf("\nERROR during symbolic factorization: %d", error);
		exit(1);
	}

Then, decompose with original matrix:

	// -------------------------------------------------------------------- 
	// .. Numerical factorization. 
	// -------------------------------------------------------------------- 

	phase = 22;
	PARDISO(pt, &maxfct, &mnum, &mtype, &phase,
		&nRows, complexValues, rowIndex, columns, &idum, &nRhs,
		iparm, &msglvl, &ddum, &ddum, &error);

	if (error != 0)
	{
		printf("\nERROR during numerical factorization: %d", error);
		exit(2);
	}

And after that after solving system, decompose with changed elements in matrix

	// -------------------------------------------------------------------- 
	// .. Numerical factorization. 
	// -------------------------------------------------------------------- 

	phase = 22;
	iparm[38] = 1;

	PARDISO(pt, &maxfct, &mnum, &mtype, &phase,
		&nRows, complexValues, rowIndex, columns, perm, &nRhs,
		iparm, &msglvl, &ddum, &ddum, &error);

	if (error != 0)
	{
		printf("\nERROR during numerical factorization: %d", error);
		exit(2);
	}

	iparm[38] = 0;

Vector perm contains row and columns indexes of changed elements in all matrix. But I have complex symmetric matrix, should I build vector perm only with changed elements in upper triangle?

Gennady_F_Intel · ‎06-05-2018

>> "The usual pardiso decomposition is faster than decomposition with low rank update"

What is the the execution time for 22 phase do you see with this case? what is the problem size ( size of this matrix) ?

>> But I have complex symmetric matrix, should I build vector perm only with changed elements in upper triangle?

yes, you should use only elements from upper matrix.

Limansky__Alexander · ‎06-05-2018

>> "What is the the execution time for 22 phase do you see with this case? what is the problem size ( size of this matrix) ? "
Execution time of usual pardiso decomposition about 0.15 sec and with low-rank update about 0.2 sec. Size of matrix is 64382.
>> "yes, you should use only elements from upper matrix."
Okay, I will try, thank you.

Limansky__Alexander · ‎06-06-2018

I used only elements from upper triangle of matrix to build Perm vector. Now decomposition with low rank takes 0.146 seconds, when without low rank 0.15 seconds (the acceleration is not more than 3%). Although complexity of decomposition without low rank O (n ^ 3), and with low rank O (n ^ 2), so I cannot understand why decomposition with low rank update still slow.

Updated: problem solved, old version of Intel MKL library.

Gennady_F_Intel · ‎06-06-2018

then we have to have the reproducer of this case. could you please share. thanks.