- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a simple MKL code that does a 3-D MKL FFT transform in place real to complex and back. I see that the results are slightly different when comparing the output from 1 OpenMP thread to multiple threads. There is no difference between 2 vs 3 or any other multiple threads.
I am attaching a simple reproducer that stores outputs into files and then compares them between 1 thread and multiple threads. For the 1vs2 threads comparison, numpy.allclose() fails, and the RMS of difference is non-zero.
$ icpx -v
Intel(R) oneAPI DPC++/C++ Compiler 2024.1.0 (2024.1.0.20240308)
$ uname -a
Linux hostname 4.18.0-553.5.1.el8_10.x86_64 #1 SMP
$ head /proc/cpuinfo | grep "model name"
model name : Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
The same test passes 1vs2 threads output on a non-Intel machine with no numerical difference between the outputs. The only difference on that machine is the CPU:
$ head /proc/cpuinfo | grep "model name"
model name : AMD EPYC 7302P 16-Core Processor
I would appreciate any guidance regarding eliminating that difference on the Intel proc if possible.
Thank you.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
What numerical difference did you get there?
I noticed a very small difference in the test results here. For this RMS, it looks fine:
RMS of difference: 1.5889534e-10
Since this computation uses single-precision floating-point, and the fractional part is 23 bits, giving it a precision of about 1e-7 to 1e-6 in the computation.
I made a minor change to the np.allclose() code, and it passes the test:
print("Allclose: ", np.allclose(array1, array2, atol=1e-07))
thanks,
Chao

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page