- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

I want to perform SVD on a matrix in parallel(C). But I am facing issues in psgesvd. When I try to perform svd of a matrix of size 4750*4750, I get an error saying that:

"{ 0, 0}: On entry to

PSGESVD parameter number 19 had an illegal value"

Parameter 19 is 'lwork' for which a make a query first which returns the minimum size of 'work' array required. The svd works fine for smaller matrices but big matrices give an error. Can anyone please help me with this as to what is the reason for this error. The value of lwork in this case is 45362800.

When I use serial "sgesvd" I don't get any such issue and the results are perfectly correct. But I am not sure what is causing an issue in this case.

Thanks in advance!

-Shailesh Tripathi

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Shailesh,

Thank you for reporting the problem! Fortunately, we've already fixed it. The fix will be available into next MKL releases: MKL 2018 Update 4 and MKL 2019.

As you know, when workspace amount is requested in LAPACK and ScaLAPACK (via LWORK=-1), it's returned into WORK(1). For single precision it means "int -> float" conversion. Unfortunately, for numbers more than 2^24 this conversion can lead to losing some low digits (because the integer number will be represented by a closest float number). As a result, users may get an integer number less than it was initially calculated into MKL. To avoid this problem, we now adjust integer numbers appropriately so that they can be represented without loss of precision as float, and then we assign this number to WORK(1).

As a workaround, you can adjust your number into the program as follows:

(*((MKL_INT*)&workpt))++; // this line does the trick - it adds 1 to workpt represented as integer. In floating point representation it means you increase your number to the next representable fp number. It's enough to be on the safe side.

lwork = (MKL_INT)workpt;

Sorry for inconvinience!

Konstantin

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

What version of mkl do you use? We have fixed the very similar issue in mkl version 2018 u2.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

Thank you for your response. My ICC version is 13.0.1.117 and I know it is quite outdated. But is it just a bug in the compiler or is it some mistake I am doing? If the first is true, I can upgrade to the latest version.

Also, what was the issue exactly that you fixed in the 2018 u2?

Thanks for your help!

Regards,

Shailesh

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2018-bug-fixes-list ; -- MKLD-3350

You may also give us reproducer which we may check on our side with the latest versions/updates.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

I am ataching my reproducer code. I made some simplifications for 1 process execution. When the matrix size is large the same error shows up.

Here is the code:

#include <iostream> #include <mpi.h> #include <stdlib.h> #include "mkl_scalapack.h" #include "mkl_blacs.h" typedef MKL_INT MDESC[ 9 ]; void allocate(float **array, int size) { if (!(*array) && *array == NULL) { *array = (float*) malloc (size * sizeof(float)); if (!(*array)) { std::cout << "FAILED TO ALLOCATE MEMORY" << std::endl; } } } void svd(float *A, int& m, int& n) { // SVD of A to get U, Vt and S float zero = 0.0e+0, one = 1.0e+0; MKL_INT i_zero = 0, i_one = 1, i_negone = -1; MDESC descA, descU, descVt, descpinvA; MKL_INT iam, nprocs, ictxt, myrow, mycol, nprow, npcol; MKL_INT mb, nb, mp, nq, mq, kr; MKL_INT info, lld, lwork = -1; char jobu = 'V', jobvt = 'V'; float *work, *S = NULL, *U = NULL, *Vt = NULL; float workpt; int k; int i,j; blacs_pinfo_( &iam, &nprocs ); nprow = 1; npcol = 1; blacs_get_( &i_negone, &i_zero, &ictxt ); blacs_gridinit_( &ictxt, "R", &nprow, &npcol ); blacs_gridinfo_( &ictxt, &nprow, &npcol, &myrow, &mycol ); mb = 1; nb = 1; k = std::min(m, n); mp = m; nq = n; kr = k; allocate(&S, k); allocate(&U, m * kr); allocate(&Vt, kr * n); descinit_( descA, &m, &n, &nb, &nb, &i_zero, &i_zero, &ictxt, &m, &info ); descinit_( descU, &m, &k, &nb, &nb, &i_zero, &i_zero, &ictxt, &m, &info ); descinit_( descVt, &k, &n, &nb, &nb, &i_zero, &i_zero, &ictxt, &k, &info ); psgesvd(&jobu, &jobvt, &m, &n, A, &i_one, &i_one, descA, S, U, &i_one, &i_one, descU, Vt, &i_one, &i_one, descVt, &workpt, &lwork, &info); lwork = (MKL_INT)workpt; work = (float*) calloc(lwork, sizeof( float )); psgesvd(&jobu, &jobvt, &m, &n, A, &i_one, &i_one, descA, S, U, &i_one, &i_one, descU, Vt, &i_one, &i_one, descVt, work, &lwork, &info); free(work); free(S); free(U); free(Vt); // Destroy temporary process grid blacs_gridexit_( &ictxt ); } int main(int argc, char** argv) { MPI_Init(&argc, &argv); float *A=NULL; int i,j; int m= 4760,n=4760 ; allocate(&A,m*n); for(i=0; i < m ; i++) { for(j=0 ; j<n ; j++) A[j*m + i]=rand() / RAND_MAX*10; } svd(A,m,n); MPI_Finalize(); return 0; }

This code breaks even with a single process.

Thanks in advance!

-Shailesh

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

yes, we sees the same behavior with the latest 2018 u2. We will investigate the cause of the problem and keep you in the loop with results.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Shailesh,

Thank you for reporting the problem! Fortunately, we've already fixed it. The fix will be available into next MKL releases: MKL 2018 Update 4 and MKL 2019.

As you know, when workspace amount is requested in LAPACK and ScaLAPACK (via LWORK=-1), it's returned into WORK(1). For single precision it means "int -> float" conversion. Unfortunately, for numbers more than 2^24 this conversion can lead to losing some low digits (because the integer number will be represented by a closest float number). As a result, users may get an integer number less than it was initially calculated into MKL. To avoid this problem, we now adjust integer numbers appropriately so that they can be represented without loss of precision as float, and then we assign this number to WORK(1).

As a workaround, you can adjust your number into the program as follows:

(*((MKL_INT*)&workpt))++; // this line does the trick - it adds 1 to workpt represented as integer. In floating point representation it means you increase your number to the next representable fp number. It's enough to be on the safe side.

lwork = (MKL_INT)workpt;

Sorry for inconvinience!

Konstantin

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

It works perfectly fine!

Thank you for your help!

-Shailesh

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page