I have implemented an orthogonalization method using the DGEQRF subroutine to compute a QR factorization. The compilation and subsequently execution works well using gfortran/MKL but when I compile and run it with Ifort/MKL I get the following error:
forrtl: error (65): floating invalid
I have checked for uninitialized variables or NaN values, and the problem seems to be an arithmetic operation inside DGEQRF.
Ifort version: ifort (IFORT) 220.127.116.11 20181018
MKL version: parallel_studio_xe_2019_update1_cluster_edition
The source code can be found at: https://github.com/NLESC-JCER/Fortran_Davidson
Line that cause the error: https://github.com/NLESC-JCER/Fortran_Davidson/blob/master/src/davidson.f90#L241
I compile the code using cmake like:
cmake -H. -Bbuild -DCMAKE_Fortran_COMPILER=ifort DCMAKE_BUILD_TYPE=Debug
cmake --build build
I run the resulting binary like:
Finally, The orthogonalization subroutine is
subroutine lapack_qr(basis) !> Orthoghonalize the basis using the QR factorization. !> QR factorization of the M-by-N (M>N) matrx A=Q*R in the form where !> Q is square M-by-M matrix and R is an upper triangular M-by-N matrix. !> The equality A=Q*R can be re-written also as a product Q1*R1 where Q1 !> is a rectangular M-by-N submatrix of the matrix Q and R1 is M-by-M !> submatrix of the R. Let us note that columns of Q1 are orthonormal !> (they are orthogonal to each other and have norms equal to 1). !> The equality A=Q1*R1 can be treated as every column of A is a linear !> combination of Q1 columns, i.e. they span the same linear space. !> In other words, columns of Q1 is the result of ortogonalization of columns A. !> DGEQRF does not not compute Q directly, DORGQR must be call subsequently. !> \param basis !> \return orthogonal basis implicit none real(dp), dimension(:, :), intent(inout) :: basis real(dp), dimension(:), allocatable :: work ! workspace, see lapack documentation real(dp), dimension(size(basis, 2)) :: tau ! see DGEQRF documentation integer :: info, lwork, m, n ! Matrix shape m = size(basis, 1) n = size(basis, 2) ! 1. Call the QR decomposition ! 1.1 Query size of the workspace (Check lapack documentation) allocate(work(1)) call DGEQRF(m, n, basis, m, tau, work, -1, info) ! 1.2 Allocate memory for the workspace lwork = max(1, int(work(1))) deallocate(work) allocate(work(lwork)) ! 1.3 Call QR factorization call DGEQRF(m, n, basis, m, tau, work, lwork, info) deallocate(work) ! 2. Generates an orthonormal matrix ! 2.1 Query size of the workspace (Check lapack documentation) allocate(work(1)) call DORGQR(m, n, min(m, n), basis, m, tau, work, -1, info) ! 2.2 Allocate memory fo the workspace lwork = max(1, int(work(1))) deallocate(work) allocate(work(lwork)) ! 2.3 compute the matrix Q call DORGQR(m, n, min(m, n), basis, m, tau, work, lwork, info) ! release memory deallocate(work) end subroutine lapack_qr
I really appreciate any help you can provide.
I think that in situations such as this, it is largely up to the original poster to isolate one call (or a small number of calls) to MKL routines and provide a description of why it is thought that the MKL routine(s) are not working correctly. In particular, it should be checked whether the arguments to the MKL routine have reasonable and meaningful values.
I printed the profile of the 50 X 8 matrix BASIS being passed to DGEQRF in file davidson.f90. Here is the pattern of zero/non-zero values:
Row 1 X000XXXX Row 2 0X00XXXX Row 3 00X0XXXX Row 4 000XXXXX Rows 5 to 50 0000XXXX
Is this correct?
The following is the output of MKL_VERBOSE when calling the qr subroutine:
MKL_VERBOSE DPOSV(U,50,1,0x7ffce57b0290,50,0x7ffce57b50b0,50,1) 428ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
MKL_VERBOSE DGEQRF(50,32,0x2e18640,50,0x7ffce57c60f0,0x2de6e40,-1,0) 432ns CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
forrtl: error (65): floating invalid
For some reason that I need to investigate further, the linear system in the previous step calling the "DPOSV" subroutine failed. I need to figure up why it works when the program is compiled with gfortran.
Also, the pattern of zeros/non-zeros value that is passed to the DGEQRF should correspond roughly to a basis for a given subspace used to compute the eigenvalue/eigenvector pair of a given matrix. In the example, I am trying to compute the lowest 3 eigenvalues of a 50 x 50 matrix using an initial subspace of dimension 4, using the Davidson method (http://www.netlib.org/utk/people/JackDongarra/etemplates/node402.html)
Thank you for your time
The code in davidson.f90 in your Github repository ignores the values of the status variable info after calls to Lapack routines such as DGEQRF, DPOSV, etc. This is acceptable only if (based on experience or previous tests) you are confident that info = 0. For the test case that is first run, I find that after the first call to DPOSV we get info=4 and for the next few subsequent calls, info=1. This (i.e., info > 0) implies that the matrix is not positive definite, because of which the factorization/solution failed.
When such a situation is encountered, the calculation should be stopped or some appropriate correction made before continuing. To continue the execution of the program despite the failed solution may invite further errors of an unpredictable nature, and no conclusions should be drawn as to which compiler+library "works" and which does not.
+ our two cents : in the case if info == 0 and you continue to see the runtime failure, please give us the reproducer which will greatly reduce the investigation time from our side. thanks.