- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello!

I am using Intel MKL 2021.3 for Linux. Server: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz.

I have noticed, that result of function DGTSV differs depending on number of right-hand-sides. Only array DL is wrong. Looks like, it is a bug, since arrays DL, D, DU represent factor U from factorization LU and it does not depend on number of right-hand-side. Also, for a fixed right-hand-side and size of the matrix this array differs from

1. GTSV NETLIB return result of array DL

2. GTTRF MKL return result of array DU2. On exit:

DU2 is overwritten by the (n-2) elements of the second super-diagonal of U.

and it should be the same as DL from GTSV.

I have implemented reproducer for an issue, described at the top of this thread. It runs GTSV for n=10, nrhs = 1...50. Results become wrong since nrhs > 11. Other matrix sizes are also affected. Please check.

Best regards,

Dmitry

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Dimitry,

Thanks for posting on Intel Communities.

We are able to reproduce your issue at our end and we are working on the same internally. We will get back to you soon with an update.

Best Regards,

Shanmukh.SS

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Dmitry,

Please find more information regarding your issue mentioned below.

GETSV uses the LU factorization with partial pivoting. However, the exact pivoting scheme is not specified and not unique. Thus, in principle, this behavior is acceptable as long as the solutions of the system stay accurate. For very tiny problems vectorization does not make much sense. This changes as the number of right-hand sides increases. It seems that at nrhs = 12 a switch to a vectorized code with a different pivoting scheme occurs, resulting in different factorization.

Please find attached, the modified reproducer file and the corresponding output file and you could check the error of computed solutions.

*The solutions to the system do stay accurate,* thus it seems not a bug.

Best Regards,

Shanmukh.SS

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi, Shanmukh.SS!

Thank you for your investigation!

Indeed, your test shows, that solution of the system is correct. Moreover, I have double checked, that residual AX - B is also correct. But, there are 4 output parameters of the function: DL, D, DU, X. We only proved, that solution X is correct. I believe, that output of DL is still wrong. Partial pivoting depends only on input matrix A. No matter, how many right-hand-side system have. Also, if partial pivoting influence the factorization, output of all three arrays DL, D, DU should be changed, since ROW interchange was applied. In my test, only one upper diagonal is changed, which looks totally incorrect.

Also, I am very confused about the point, that some MKL optimizations can lead to result, different from reference NETLIB LAPACK implementation. I believe, that result |DL_NETLIB - DL_MKL| can be not the same in terms of floating point operations with some acceptable error, not bitwise. But in my test, these both results are completely different, norm = 1.44663. I mean, this norm |DL_NETLIB - DL_MKL| will be also equal to 1.44663 for a fixed RHS more than 11.

Moreover, there is another MKL function GTTRF, which also make PLU factorization. Why results of DL are also different from its output?

Please help me in further investigation of the issue according to my points or share your ideas.

Anyway, I will try to create one more test on my side to check PLU=A is correct.

Best regards,

Dmitry

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Please find new test attached. This test has the following steps:

1. Compute DGTSV.

2. Fill dense matrix U by output of arrays DL, D, DU (after factorization) and zeros.

3. Compute reciprocal of condition number by DTRCON.

4. Fill dense matrix A with initial input DL, D, DU and zeros.

5. Compute PL = A * U^(-1) with DTRSM

6. Check difference |A - PL * U| with DGEMM.

Run test for cases n=1..100, nrhs = 1..100.

One can see, that depending on different RHS condition number changes... For some sizes this change is huge, and it influences on inversion of U^(-1) by TRSM. In result, final difference |A - PL * U| is also huge. From log:

------

func = GTSV n = 098 nrhs = 003 rcondU = 1.846317e-04 |PLU - A| = 5.105739e-16 |PLU - A| / eps = 4.598841

func = GTSV n = 098 nrhs = 004 rcondU = 1.410461e-09 |PLU - A| = 1.535155e-09 |PLU - A| / eps = 13827448.549124

------

I have run the same test for NETLIB LAPACK, please see the results attached. All residuals are good.

Moreover, I have replaced MKL DGTSV by MKL DGTTRF (second mode of the test), and all works fine here too. There are no huge errors.

Please try to reproduce results on your side.

Best regards,

Dmitry

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Dear Dmitry,

thank you for your post. There are several aspects you mention, which I will try to address separately.

Also, if partial pivoting influence the factorization, output of all three arrays DL, D, DU should be changed, since ROW interchange was applied.

Indeed, you are right, this *should* be the case but apparently is not. We will further investigate this issue and come back to you; it might be a bug.

Partial pivoting depends only on input matrix A. No matter, how many right-hand-side system have.

Mathematically speaking, this is of course correct. But depending on the size and number of right-hand sides of your system, the library may internally choose different code paths and implementations to achieve optimal performance.

Also, I am very confused about the point, that some MKL optimizations can lead to result, different from reference NETLIB LAPACK implementation. I believe, that result |DL_NETLIB - DL_MKL| can be not the same in terms of floating point operations with some acceptable error, not bitwise. But in my test, these both results are completely different, norm = 1.44663. I mean, this norm |DL_NETLIB - DL_MKL| will be also equal to 1.44663 for a fixed RHS more than 11.

Please note that the PA=LU factorisation is *not unique,* because of the pivotisation P. Consider for example a matrix, whose first column is [0; 1; 1; 1; 1; 1; ...; 1]. It is clear that pivotisation is necessary, because of the 0 entry, but all others are identical. Partial pivotisation only considers the current column, and it is thus arbitrary which row is chosen as pivot. Netlib's implementation may chose a different row than MKL.

MKL provides a highly optimised implementation of LAPACK's interface, but this interface does not specify the pivotisation scheme. MKL does *not* aim to follow all of Netlib's internal design decisions, and may therefore provide different results for non-unique factorisations. For comparing Netlib's and MKL's results, please consider using unique factorisations. For example, the A=QR factorisation *is* unique for invertible, square matrices A.

1. Compute DGTSV.

2. Fill dense matrix U by output of arrays DL, D, DU (after factorization) and zeros.

3. Compute reciprocal of condition number by DTRCON.

4. Fill dense matrix A with initial input DL, D, DU and zeros.

5. Compute PL = A * U^(-1) with DTRSM

6. Check difference |A - PL * U| with DGEMM.

Run test for cases n=1..100, nrhs = 1..100.

One can see, that depending on different RHS condition number changes... For some sizes this change is huge, and it influences on inversion of U^(-1) by TRSM. In result, final difference |A - PL * U| is also huge. From log:

------

func = GTSV n = 098 nrhs = 003 rcondU = 1.846317e-04 |PLU - A| = 5.105739e-16 |PLU - A| / eps = 4.598841

func = GTSV n = 098 nrhs = 004 rcondU = 1.410461e-09 |PLU - A| = 1.535155e-09 |PLU - A| / eps = 13827448.549124------

GTSV does not output P or L; and I assume that for this reason you aim to obtain the product PL by computing AU^{-1}. However, you are numerically computing |A - AUU^(-1)|, where cond(U)=cond(U^(-1)) is approximately 1e9. When forming two subsequent products with matrices of such a large condition number, it is not surprising that you lose several digits of accuracy. As discussed before, in the GTSV case the factor U might indeed be incorrect.

I understand your post such that MKL's implementation of DGTTRF does not have this issue, so I assume that this function is working correctly.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi, Matthias!

Thank you a lot for such comprehensive reply!

I have understood a few things. Let me summarize some of them and reformulate from my side.

1. Results of MKL and NETLIB can be different due to some optimization, applied by MKL. It has place to be.

2. On example of function DGTSV these optimizations lead to fast and correct solution of the system AX=B, but also to strange format of factor U, which is suspicious and not consistent with theory of row partial pivoting (only one diagonal in factor U changes).

3. Also optimizations lead to significant increase of condition number of factor U from PLU factorization. For n = 98 - difference in rcond for nrhs = 3 and nrhs = 4 : five floating point digits (larger sizes can be under the influence more, but I did not check it yet).

4. Consequently, bad-conditioned factor U can't be normally inverted and leads to more loss of digits after inversion.

Finally, in my application, it forces me to use sequence GTTRF + GTTRS to solve the system, and further usage of well-conditioned factor U since result of GTSV differs from standard factorization and solving scheme, provided by GTTRF and GTTRS. I believe, this should be investigated.

Please correct me if I missed something or wrote wrong point.

Looking forward for your reply!

Best regards,

Dmitry

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Dear Dmitry,

sounds mostly right to me.

3. Also optimizations lead to significant increase of condition number of factor U from PLU factorization. For n = 98 - difference in rcond for nrhs = 3 and nrhs = 4 : five floating point digits (larger sizes can be under the influence more, but I did not check it yet).

Yes. But it *might* be that the huge difference in the condition number stems from GTSV outputting incorrect data for U.

Finally, in my application, it forces me to use sequence GTTRF + GTTRS to solve the system

If you are just interested in the *solution* of the system AX=B, GTSV *did* produce correct results for X. If you additionally desire the upper factor U from the factorisation PA=LU, this is what I would recommend at the moment.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Dmitry,

We are discussing this issue internally. Will get back to you soon with an update.

Apologies for the delay in response.

Best Regards,

Shanmukh.SS

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hello, Shanmukh.SS!

Thank you for your respond!

Ok! I feel, that issue is very strange and needs investigation. Please, take your time.

I'm looking forward for your reply.

Best regards,

Dmitry

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi Dmitry,

Thanks for reporting this issue. We were able to reproduce it and we have informed the development team about it.

Best Regards,

Shanmukh.SS

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page