Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Frobenius norm LANSY returns wrong result

kdv
New Contributor I
548 Views

Hello!

 

I have just met issue related to LANSY norm = "F".

 

1) Result depends on number of threads (it should not be). Correct result returns only for number of threads equals to 1.

2) Correct results are returned for any number of threads only for sizes <= 127.

 

Under "correct" one can take value of norm, returned by reference NETLIB algorithm. But difference "correct" vs "wrong" is in first floating point digit for double precision. I have attached few logs.

 

Other norms (M, 1, I) are good. Also result is correct using LANGE. 

 

Issue is reproduced for all versions of MKL starting 2021 up to 2025. Affected all precisions and  uplo`s (L, U).

Server: Intel(R) Xeon(R) Gold 6248 CPU

 

Please try to reproduce on your side.

 

Best regards,

Dmitry

 

 

0 Kudos
7 Replies
Ruqiu_C_Intel
Moderator
413 Views

Hi Dmitry,


Thank you for posting your issue.


Looks I run out the same result for oneMKL with default thread numbers and Netlib as below:


# MKL_VERBOSE=1 ./test_netlib

Frobenius norm using NETLIB: 24.083189


# MKL_VERBOSE=1 ./test_mkl


MKL_VERBOSE oneMKL 2024.0 Update 1 Product build 20240215 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Intel(R) Deep Learning Boost (Intel(R) DL Boost), Lnx 2.10GHz lp64 intel_thread MKL_VERBOSE DLANSY(F,U,4,0x557f08771d80,4,(nil)) 1.59ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:48

Frobenius norm: 24.083189


Attached my reproducers. Please upload your simple reproducers if possible.


Regards,

Ruqiu


0 Kudos
kdv2
Beginner
390 Views

Hi Ruqiu!

 

Sorry, I cant access my main account for some reason, I am responding you from another one.

 

Thank you for the investigating the issue!

 

Looks like you forgot to attach reproducers. Please attach, I will check it on my side.

 

BTW, from MKL_VERBOSE I see that you run LANSY for size = 4. I posted before, that for sizes < 128 results are correct. Inconsistent behavior starts at size n = 128 and more. Please try to increase size and check once again.

 

If issue is still not reproducible, I will also create reproducer, but a bit  more time is required.

 

Best regards,

Dmitry

 

0 Kudos
Ruqiu_C_Intel
Moderator
346 Views

Hi Dmitry,


The Netlib implementation is typically single-threaded and does not perform parallel computations, while oneMKL performs parallel computations in default.


When using the LAPACKE_dlansy function in oneMKL, you might observe differences in results between multi-threaded and single-threaded executions. This discrepancy can be attributed to several factors. In multi-threaded environments, the order of operations can vary due to parallel execution. This can lead to differences in rounding errors, which accumulate differently compared to single-threaded execution. Also parallel computation introduces non-determinism because different threads may execute in different orders and access memory at different times. This non-determinism can lead to slight variations in the results, especially in floating-point arithmetic.​ oneMKL uses different algorithms or optimizations to improve performance. These algorithmic differences can also lead to variations in the results, especially for large matrices.


Regards,

Ruqiu


0 Kudos
kdv
New Contributor I
308 Views

Hello, Ruqiu!

 

Thanks for your explanation. But I am afraid, situation is different. When we have difference between correct and wrong result in first digit in double precision, it is not about rounding errors and non-determinism in order of calculation in multi-threaded version. It is about broken and non-thread safe functionality.

 

Please check my reproducer. Also, I have attached the table below with results obtained. Algorithm works correct for size < 128 (result does not depend on number of threads) and failed for sizes >= 128.

 

size/threads124102040
12673.31042073.31042073.31042073.31042073.31042073.310420
12772.67665272.67665272.67665272.67665272.67665272.676652
12873.08281460.45682467.65785370.93734871.92263272.549838
12974.45568561.87647169.19064572.33131373.23444573.878931

 

For size n = 128 norm varies from 60 to 73. I believe, it is not a slight variation in results. I am asking for a help to reproduce these results, because only two ways can happen:

1) Environment issue, incorrect server settings, wrong library linkage and etc.

2) Bug in code = broken functionality

 

Please try to reproduce on your side. 

 

Best regards,

Dmitry

0 Kudos
kdv
New Contributor I
308 Views

Hello, Ruqiu!

 

Thanks for your explanation. But I am afraid, situation is different. When we have difference between correct and wrong result in first digit in double precision, it is not about rounding errors and non-determinism in order of calculation in multi-threaded version. It is about broken and non-thread safe functionality.

 

Please check my reproducer. Also, I have attached the table below with results obtained. Algorithm works correct for size < 128 (result does not depend on number of threads) and failed for sizes >= 128.

 

size/threads124102040
12673.31042073.31042073.31042073.310420 73.310420 73.310420
12772.67665272.67665272.67665272.67665272.67665272.676652
12873.08281460.45682467.65785370.93734871.92263272.549838
12974.45568561.87647169.19064572.33131373.23444573.878931

 

For size n = 128 norm varies from 60 to 73. I believe, it is not a slight variation in results. I am asking for a help to reproduce these results, because only two ways can happen:

1) Environment issue, incorrect server settings, wrong library linkage and etc.

2) Bug in code = broken functionality

 

Please try to reproduce on your side.

 

Best regards,

Dmitry

0 Kudos
Ruqiu_C_Intel
Moderator
181 Views

Hi Dmitry,

Thank you for the reproducer.

We will investigate and update here once we have progress.


Regards,

Ruqiu


0 Kudos
Ruqiu_C_Intel
Moderator
108 Views

We have reproduced the issue and will fix it in a future release. Thank you for your patience.


0 Kudos
Reply