Solved: Hello,

Tripathi__Shailesh · ‎05-20-2018

Hello,

I want to perform SVD on a matrix in parallel(C). But I am facing issues in psgesvd. When I try to perform svd of a matrix of size 4750*4750, I get an error saying that:

"{ 0, 0}: On entry to
PSGESVD parameter number 19 had an illegal value"

Parameter 19 is 'lwork' for which a make a query first which returns the minimum size of 'work' array required. The svd works fine for smaller matrices but big matrices give an error. Can anyone please help me with this as to what is the reason for this error. The value of lwork in this case is 45362800.

When I use serial "sgesvd" I don't get any such issue and the results are perfectly correct. But I am not sure what is causing an issue in this case.

Thanks in advance!

-Shailesh Tripathi

Konstantin_A_Intel · ‎05-29-2018

Hi Shailesh,

Thank you for reporting the problem! Fortunately, we've already fixed it. The fix will be available into next MKL releases: MKL 2018 Update 4 and MKL 2019.

As you know, when workspace amount is requested in LAPACK and ScaLAPACK (via LWORK=-1), it's returned into WORK(1). For single precision it means "int -> float" conversion. Unfortunately, for numbers more than 2^24 this conversion can lead to losing some low digits (because the integer number will be represented by a closest float number). As a result, users may get an integer number less than it was initially calculated into MKL. To avoid this problem, we now adjust integer numbers appropriately so that they can be represented without loss of precision as float, and then we assign this number to WORK(1).

As a workaround, you can adjust your number into the program as follows:

(*((MKL_INT*)&workpt))++; // this line does the trick - it adds 1 to workpt represented as integer. In floating point representation it means you increase your number to the next representable fp number. It's enough to be on the safe side.

lwork = (MKL_INT)workpt;

Sorry for inconvinience!

Konstantin

View solution in original post

Gennady_F_Intel · ‎05-20-2018

What version of mkl do you use? We have fixed the very similar issue in mkl version 2018 u2.

Tripathi__Shailesh · ‎05-20-2018

Hello,

Thank you for your response. My ICC version is 13.0.1.117 and I know it is quite outdated. But is it just a bug in the compiler or is it some mistake I am doing? If the first is true, I can upgrade to the latest version.

Also, what was the issue exactly that you fixed in the 2018 u2?

Thanks for your help!

Regards,
Shailesh

Gennady_F_Intel · ‎05-20-2018

https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2018-bug-fixes-list ; -- MKLD-3350

You may also give us reproducer which we may check on our side with the latest versions/updates.

Tripathi__Shailesh · ‎05-26-2018

Hello,
I am ataching my reproducer code. I made some simplifications for 1 process execution. When the matrix size is large the same error shows up.

Here is the code:

#include <iostream>
#include <mpi.h>
#include <stdlib.h>
#include "mkl_scalapack.h"
#include "mkl_blacs.h"

typedef MKL_INT MDESC[ 9 ];

void allocate(float **array, int size) {
    if (!(*array) && *array == NULL) {
        *array = (float*) malloc (size * sizeof(float));
        if (!(*array)) {
            std::cout << "FAILED TO ALLOCATE MEMORY" << std::endl;
        }
    }
}

void svd(float *A, int& m, int& n) {

    // SVD of A to get U, Vt and S
    float zero = 0.0e+0, one = 1.0e+0;
    MKL_INT i_zero = 0, i_one = 1, i_negone = -1;

    MDESC descA, descU, descVt, descpinvA;

    MKL_INT iam, nprocs, ictxt, myrow, mycol, nprow, npcol;
    MKL_INT mb, nb, mp, nq, mq, kr;
    MKL_INT info, lld, lwork = -1;
    char jobu = 'V', jobvt = 'V';
    float *work, *S = NULL, *U = NULL, *Vt = NULL;
    float workpt;
    int k;
    int i,j; 

    blacs_pinfo_( &iam, &nprocs );
    nprow = 1;
    npcol = 1;

    blacs_get_( &i_negone, &i_zero, &ictxt );
    blacs_gridinit_( &ictxt, "R", &nprow, &npcol );
    blacs_gridinfo_( &ictxt, &nprow, &npcol, &myrow, &mycol );

    mb = 1;
    nb = 1;
    k = std::min(m, n);

    mp = m;
    nq = n;
    kr = k;

    allocate(&S, k);
    allocate(&U, m * kr);
    allocate(&Vt, kr * n);

    descinit_( descA, &m, &n, &nb, &nb, &i_zero, &i_zero, &ictxt, &m, &info );
    descinit_( descU, &m, &k, &nb, &nb, &i_zero, &i_zero, &ictxt, &m, &info );
    descinit_( descVt, &k, &n, &nb, &nb, &i_zero, &i_zero, &ictxt, &k, &info );

    psgesvd(&jobu, &jobvt, &m, &n, A, &i_one, &i_one, descA, S, U, &i_one, &i_one, descU, Vt, &i_one, &i_one, descVt, &workpt, &lwork, &info);

    lwork = (MKL_INT)workpt;

    work = (float*) calloc(lwork, sizeof( float ));
	
    psgesvd(&jobu, &jobvt, &m, &n, A, &i_one, &i_one, descA, S, U, &i_one, &i_one, descU, Vt, &i_one, &i_one, descVt, work, &lwork, &info);

    free(work);
    free(S);
    free(U);
    free(Vt);

    // Destroy temporary process grid
    blacs_gridexit_( &ictxt );
}


int main(int argc, char** argv)
{
	MPI_Init(&argc, &argv);
	
	float *A=NULL;
	int i,j;
	int m= 4760,n=4760 ;
	allocate(&A,m*n);
	
	for(i=0; i < m ; i++)
	{
		for(j=0 ; j<n ; j++)
			A[j*m + i]=rand() / RAND_MAX*10;
	}

	svd(A,m,n);
	MPI_Finalize();
	return 0;
}

This code breaks even with a single process.
Thanks in advance!

-Shailesh

Gennady_F_Intel · ‎05-27-2018

yes, we sees the same behavior with the latest 2018 u2. We will investigate the cause of the problem and keep you in the loop with results.

Konstantin_A_Intel · ‎05-29-2018

Hi Shailesh,

Thank you for reporting the problem! Fortunately, we've already fixed it. The fix will be available into next MKL releases: MKL 2018 Update 4 and MKL 2019.

As you know, when workspace amount is requested in LAPACK and ScaLAPACK (via LWORK=-1), it's returned into WORK(1). For single precision it means "int -> float" conversion. Unfortunately, for numbers more than 2^24 this conversion can lead to losing some low digits (because the integer number will be represented by a closest float number). As a result, users may get an integer number less than it was initially calculated into MKL. To avoid this problem, we now adjust integer numbers appropriately so that they can be represented without loss of precision as float, and then we assign this number to WORK(1).

As a workaround, you can adjust your number into the program as follows:

(*((MKL_INT*)&workpt))++; // this line does the trick - it adds 1 to workpt represented as integer. In floating point representation it means you increase your number to the next representable fp number. It's enough to be on the safe side.

lwork = (MKL_INT)workpt;

Sorry for inconvinience!

Konstantin

Tripathi__Shailesh · ‎05-30-2018

It works perfectly fine!
Thank you for your help!

-Shailesh

PSGESVD: Illegal parameter 19