Re: Nan's in Matrix After calling cgetrf in Intel OneAPI 2025.3

jmaha · ‎01-13-2026

I am constructing the LU factorization of a dense block matrix in c++. While performing the block updates, I have to compute the LU of the diagonal blocks during the updates. I'm currently leveraging LAPACKE_cgetrf as my matrix is stored in single-precision.

During this process, my second diagonal block enters cgetrf okay (e.g. the matrix is not singular, and the singular values are reasonable (max/min ~ 50). When it comes out of LAPACKE_cgetrf, there are nan's along the lower triangular that start about a third of the way through the matrix. I've checked 'info', and it comes out set to 0 which would seem to indicate that MKL thinks that the factorization has been successfully computed.

Things that I've tried:

Exporting the suspect "input" matrix block and importing into Matlab to see if I can construct the LU - This works.
Exporting the suspect "input" matrix block and importing into a stand-alone C++ code that just reads the single matrix, constructs the LU factorization (using OneAPI 2025.3), and writes the result to disk - This works.
Reverting back to Intel OneAPI 2024.0 in the original code and constructing the full block LU factorization - This works and the results seem correct.
Run the original code through both Valgrind and Asan - Both of these are not indicating any sort of memory issues.
In the original code, copy the contents of my single-precision diagonal matrix into a temporary double-precision buffer. Factorize this temporary matrix using LAPACKE_zgetrf. Copy the contents of the buffer back into original single-precision matrix. This works.

As the first two bullets indicate, I don't really have a small representative problem to share, and our block LU factorization contains some proprietary algorithms.

I'm just asking whether anyone has experienced anything similar to this - has anyone noticed single-precision LU factorization failing with Nan's reported in the matrix after performing LAPACKE_cgetrf in OneAPI 2025.3? If so, were you able to find a work-around, or did you have to revert back to an older version?

Thanks!

Fengrui · ‎01-14-2026

Hi,

Thanks for reporting this issue.

We need more information as there is not a small reproducer.

Could you run both the original code and the stand-alone C++ code with MKL_VERBOSE=1, and share the output for the cgetrf calls?
Except for the NaNs, are other elements of the output correct?
Could you try calling mkl_free_buffers() before the call to cgetrf?

Thanks,

Fengrui

jmaha · ‎01-19-2026

Thanks for reaching out Fengui!

It turns out that the problem likely stems from subnormal numbers being generated during my global block LU factorization. Prior to making any calls to LAPACKE_cgetrf on any sub-block I issued the following command:

_mm_setcsr(_mm_getcsr() | 0x8040)

This flushes all of the underflows to zero, and treats denormal inputs as zero.

This seems to have fixed my issue!

The biggest clue was the fact that I could copy the contents of my matrix into a double-precision buffer, call LAPACKE_zgetrf on the double-precision data, and copy the results back to my original single-precision buffer and my factorization was ultimately successful. However, calling LAPACKE_cgetrf on the original data always failed.