Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7220 ディスカッション

Frobenius norm LANSY returns wrong result

kdv
新規コントリビューター I
1,744件の閲覧回数

Hello!

 

I have just met issue related to LANSY norm = "F".

 

1) Result depends on number of threads (it should not be). Correct result returns only for number of threads equals to 1.

2) Correct results are returned for any number of threads only for sizes <= 127.

 

Under "correct" one can take value of norm, returned by reference NETLIB algorithm. But difference "correct" vs "wrong" is in first floating point digit for double precision. I have attached few logs.

 

Other norms (M, 1, I) are good. Also result is correct using LANGE. 

 

Issue is reproduced for all versions of MKL starting 2021 up to 2025. Affected all precisions and  uplo`s (L, U).

Server: Intel(R) Xeon(R) Gold 6248 CPU

 

Please try to reproduce on your side.

 

Best regards,

Dmitry

 

 

0 件の賞賛
1 解決策
Ruqiu_C_Intel
モデレーター
1,008件の閲覧回数

The fixed will be available in the coming release.


元の投稿で解決策を見る

8 返答(返信)
Ruqiu_C_Intel
モデレーター
1,609件の閲覧回数

Hi Dmitry,


Thank you for posting your issue.


Looks I run out the same result for oneMKL with default thread numbers and Netlib as below:


# MKL_VERBOSE=1 ./test_netlib

Frobenius norm using NETLIB: 24.083189


# MKL_VERBOSE=1 ./test_mkl


MKL_VERBOSE oneMKL 2024.0 Update 1 Product build 20240215 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Intel(R) Deep Learning Boost (Intel(R) DL Boost), Lnx 2.10GHz lp64 intel_thread MKL_VERBOSE DLANSY(F,U,4,0x557f08771d80,4,(nil)) 1.59ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:48

Frobenius norm: 24.083189


Attached my reproducers. Please upload your simple reproducers if possible.


Regards,

Ruqiu


kdv2
ビギナー
1,586件の閲覧回数

Hi Ruqiu!

 

Sorry, I cant access my main account for some reason, I am responding you from another one.

 

Thank you for the investigating the issue!

 

Looks like you forgot to attach reproducers. Please attach, I will check it on my side.

 

BTW, from MKL_VERBOSE I see that you run LANSY for size = 4. I posted before, that for sizes < 128 results are correct. Inconsistent behavior starts at size n = 128 and more. Please try to increase size and check once again.

 

If issue is still not reproducible, I will also create reproducer, but a bit  more time is required.

 

Best regards,

Dmitry

 

Ruqiu_C_Intel
モデレーター
1,542件の閲覧回数

Hi Dmitry,


The Netlib implementation is typically single-threaded and does not perform parallel computations, while oneMKL performs parallel computations in default.


When using the LAPACKE_dlansy function in oneMKL, you might observe differences in results between multi-threaded and single-threaded executions. This discrepancy can be attributed to several factors. In multi-threaded environments, the order of operations can vary due to parallel execution. This can lead to differences in rounding errors, which accumulate differently compared to single-threaded execution. Also parallel computation introduces non-determinism because different threads may execute in different orders and access memory at different times. This non-determinism can lead to slight variations in the results, especially in floating-point arithmetic.​ oneMKL uses different algorithms or optimizations to improve performance. These algorithmic differences can also lead to variations in the results, especially for large matrices.


Regards,

Ruqiu


kdv
新規コントリビューター I
1,504件の閲覧回数

Hello, Ruqiu!

 

Thanks for your explanation. But I am afraid, situation is different. When we have difference between correct and wrong result in first digit in double precision, it is not about rounding errors and non-determinism in order of calculation in multi-threaded version. It is about broken and non-thread safe functionality.

 

Please check my reproducer. Also, I have attached the table below with results obtained. Algorithm works correct for size < 128 (result does not depend on number of threads) and failed for sizes >= 128.

 

size/threads124102040
12673.31042073.31042073.31042073.31042073.31042073.310420
12772.67665272.67665272.67665272.67665272.67665272.676652
12873.08281460.45682467.65785370.93734871.92263272.549838
12974.45568561.87647169.19064572.33131373.23444573.878931

 

For size n = 128 norm varies from 60 to 73. I believe, it is not a slight variation in results. I am asking for a help to reproduce these results, because only two ways can happen:

1) Environment issue, incorrect server settings, wrong library linkage and etc.

2) Bug in code = broken functionality

 

Please try to reproduce on your side. 

 

Best regards,

Dmitry

kdv
新規コントリビューター I
1,504件の閲覧回数

Hello, Ruqiu!

 

Thanks for your explanation. But I am afraid, situation is different. When we have difference between correct and wrong result in first digit in double precision, it is not about rounding errors and non-determinism in order of calculation in multi-threaded version. It is about broken and non-thread safe functionality.

 

Please check my reproducer. Also, I have attached the table below with results obtained. Algorithm works correct for size < 128 (result does not depend on number of threads) and failed for sizes >= 128.

 

size/threads124102040
12673.31042073.31042073.31042073.310420 73.310420 73.310420
12772.67665272.67665272.67665272.67665272.67665272.676652
12873.08281460.45682467.65785370.93734871.92263272.549838
12974.45568561.87647169.19064572.33131373.23444573.878931

 

For size n = 128 norm varies from 60 to 73. I believe, it is not a slight variation in results. I am asking for a help to reproduce these results, because only two ways can happen:

1) Environment issue, incorrect server settings, wrong library linkage and etc.

2) Bug in code = broken functionality

 

Please try to reproduce on your side.

 

Best regards,

Dmitry

Ruqiu_C_Intel
モデレーター
1,377件の閲覧回数

Hi Dmitry,

Thank you for the reproducer.

We will investigate and update here once we have progress.


Regards,

Ruqiu


Ruqiu_C_Intel
モデレーター
1,304件の閲覧回数

We have reproduced the issue and will fix it in a future release. Thank you for your patience.


Ruqiu_C_Intel
モデレーター
1,009件の閲覧回数

The fixed will be available in the coming release.


返信