Problem with ZCGESV

Alexander_D_6 · ‎06-30-2015

The problem I am facing is that ZCGESV function crashes when matrix size is 46497 or more. When matrix size is, for example, 46202 everything works fine.

From what I can see in LAPACK sources at http://www.netlib.org/lapack/explore-html/d5/d4a/zcgesv_8f_source.html, there can be integer overflow for variable ptsx for large n as ptsx is declared as INTEGER:

*     .. Local Scalars ..
INTEGER            i, iiter, ptsa, ptsx

* Set the indices PTSA, PTSX for referencing SA and SX in SWORK.
ptsa = 1
ptsx = ptsa + n*n

It looks like there is a similar problem in zcgesv routine in MKL. I couldn't find any information in MKL documentation related to this issue. Can anyone confirm that this is a bug in MKL?

Thanks,

Alexander

mecej4 · ‎06-30-2015

I don't think that we should consider every possible instance of integer overflow, division by zero, etc., in library routines to be a bug. The user has to take steps not to have or generate arrays that are too large to be addressed with 32-bit pointers if the 32-bit or LP64 versions of MKL are to be used.

Have you tried compiling for X64 using the ILP64 MKL libraries? Is your huge matrix dense as well?

Alexander_D_6 · ‎06-30-2015

Thanks for your reply.

How can I take steps to ensure there is no integer overflow *inside* MKL routine? I have two similar matrices, first one with N=46202 and the second one with N=46497. I call zcgesv and it solves the first matrix and crashes for the second one. The code is compiled for X64. Actually I don't know what exactly happens inside zcgesv, it was only my first assumption that integer overflows.

mecej4 · ‎06-30-2015

It would be useful for you to post the source code for a case where the ZCGESV fails, and to provide information on how you built the program and what the crash messages were.

Alexander_D_6 · ‎07-01-2015

I have attached Visual Studio project with full source code of the solver using zcgesv. Number of unknowns can be controlled from input.dat file. It's currently set to 47 000, which requires ~33 GB of RAM. Screenshot of access violation is also attached.

And one more question. Does it make sense to declare swork array as swork(:,:) or swork(n,n,nrhs)? If possible, will it provide any advantage?

Thanks,

Alexander

mecej4 · ‎07-01-2015

I do not have access to any machine with that much RAM.

Alexander D. wrote:

And one more question. Does it make sense to declare swork array as swork(:,:) or swork(n,n,nrhs)? If possible, will it provide any advantage?

You are using the F77 interface to Lapack/MKL. Therefore, the argument swork is passed by reference, i.e., the address of the first element of the array is passed. Therefore, there will be absolutely no difference as far as the call to zcgesv is concerned. Of course, there will be differences in the declaration (and allocation, if appropriate).

Alexander_D_6 · ‎07-02-2015

The problem was solved by migrating to 64-bit integers and linking with ILP64 version of MKL.

Thanks for your help.