- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
I have just met issue related to LANSY norm = "F".
1) Result depends on number of threads (it should not be). Correct result returns only for number of threads equals to 1.
2) Correct results are returned for any number of threads only for sizes <= 127.
Under "correct" one can take value of norm, returned by reference NETLIB algorithm. But difference "correct" vs "wrong" is in first floating point digit for double precision. I have attached few logs.
Other norms (M, 1, I) are good. Also result is correct using LANGE.
Issue is reproduced for all versions of MKL starting 2021 up to 2025. Affected all precisions and uplo`s (L, U).
Server: Intel(R) Xeon(R) Gold 6248 CPU
Please try to reproduce on your side.
Best regards,
Dmitry
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
Thank you for posting your issue.
Looks I run out the same result for oneMKL with default thread numbers and Netlib as below:
# MKL_VERBOSE=1 ./test_netlib
Frobenius norm using NETLIB: 24.083189
# MKL_VERBOSE=1 ./test_mkl
MKL_VERBOSE oneMKL 2024.0 Update 1 Product build 20240215 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Intel(R) Deep Learning Boost (Intel(R) DL Boost), Lnx 2.10GHz lp64 intel_thread MKL_VERBOSE DLANSY(F,U,4,0x557f08771d80,4,(nil)) 1.59ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:48
Frobenius norm: 24.083189
Attached my reproducers. Please upload your simple reproducers if possible.
Regards,
Ruqiu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ruqiu!
Sorry, I cant access my main account for some reason, I am responding you from another one.
Thank you for the investigating the issue!
Looks like you forgot to attach reproducers. Please attach, I will check it on my side.
BTW, from MKL_VERBOSE I see that you run LANSY for size = 4. I posted before, that for sizes < 128 results are correct. Inconsistent behavior starts at size n = 128 and more. Please try to increase size and check once again.
If issue is still not reproducible, I will also create reproducer, but a bit more time is required.
Best regards,
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
The Netlib implementation is typically single-threaded and does not perform parallel computations, while oneMKL performs parallel computations in default.
When using the LAPACKE_dlansy function in oneMKL, you might observe differences in results between multi-threaded and single-threaded executions. This discrepancy can be attributed to several factors. In multi-threaded environments, the order of operations can vary due to parallel execution. This can lead to differences in rounding errors, which accumulate differently compared to single-threaded execution. Also parallel computation introduces non-determinism because different threads may execute in different orders and access memory at different times. This non-determinism can lead to slight variations in the results, especially in floating-point arithmetic. oneMKL uses different algorithms or optimizations to improve performance. These algorithmic differences can also lead to variations in the results, especially for large matrices.
Regards,
Ruqiu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Ruqiu!
Thanks for your explanation. But I am afraid, situation is different. When we have difference between correct and wrong result in first digit in double precision, it is not about rounding errors and non-determinism in order of calculation in multi-threaded version. It is about broken and non-thread safe functionality.
Please check my reproducer. Also, I have attached the table below with results obtained. Algorithm works correct for size < 128 (result does not depend on number of threads) and failed for sizes >= 128.
size/threads | 1 | 2 | 4 | 10 | 20 | 40 |
126 | 73.310420 | 73.310420 | 73.310420 | 73.310420 | 73.310420 | 73.310420 |
127 | 72.676652 | 72.676652 | 72.676652 | 72.676652 | 72.676652 | 72.676652 |
128 | 73.082814 | 60.456824 | 67.657853 | 70.937348 | 71.922632 | 72.549838 |
129 | 74.455685 | 61.876471 | 69.190645 | 72.331313 | 73.234445 | 73.878931 |
For size n = 128 norm varies from 60 to 73. I believe, it is not a slight variation in results. I am asking for a help to reproduce these results, because only two ways can happen:
1) Environment issue, incorrect server settings, wrong library linkage and etc.
2) Bug in code = broken functionality
Please try to reproduce on your side.
Best regards,
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Ruqiu!
Thanks for your explanation. But I am afraid, situation is different. When we have difference between correct and wrong result in first digit in double precision, it is not about rounding errors and non-determinism in order of calculation in multi-threaded version. It is about broken and non-thread safe functionality.
Please check my reproducer. Also, I have attached the table below with results obtained. Algorithm works correct for size < 128 (result does not depend on number of threads) and failed for sizes >= 128.
size/threads | 1 | 2 | 4 | 10 | 20 | 40 |
126 | 73.310420 | 73.310420 | 73.310420 | 73.310420 | 73.310420 | 73.310420 |
127 | 72.676652 | 72.676652 | 72.676652 | 72.676652 | 72.676652 | 72.676652 |
128 | 73.082814 | 60.456824 | 67.657853 | 70.937348 | 71.922632 | 72.549838 |
129 | 74.455685 | 61.876471 | 69.190645 | 72.331313 | 73.234445 | 73.878931 |
For size n = 128 norm varies from 60 to 73. I believe, it is not a slight variation in results. I am asking for a help to reproduce these results, because only two ways can happen:
1) Environment issue, incorrect server settings, wrong library linkage and etc.
2) Bug in code = broken functionality
Please try to reproduce on your side.
Best regards,
Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
Thank you for the reproducer.
We will investigate and update here once we have progress.
Regards,
Ruqiu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have reproduced the issue and will fix it in a future release. Thank you for your patience.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page