Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7232 Discussions

[BUG] dsymv corrupts the xmm7 register on AMD processors

michaelkonecny
New Contributor I
279 Views


Hello,

I found a memory corruption bug in the dsymv function that manifests itself on AMD processors.
The function dsymv changes the value of the xmm7 register upon its exit even though this is not allowed (see x64 ABI conventions - register volatility).

I'm using Visual Studio Community 17.14.14 and oneAPI 2024.1 on Windows.

I've created a minimalistic reproducer of the bug, the source code of which can be found here:
https://bitbucket.org/k2fem/bug_intel_mkl_symv/src/master/bug_mkl.f90

In brief:
- I set the value of xmm7 to a known value
- I call dsymv (using a function wrapper solve_sym_ldl)
- I read the value of xmm7
- I print out the original and the resulting value of xmm7. The values should be the same, but on AMD processors, they differ.

I was able to reproduce this on three different AMD CPUs:
- AMD EPYC 7502P
- AMD Ryzen AI 9 HX PRO 370
- AMD Ryzen 7 Pro 6850
which leads me to the conclusion that might be AMD-specific behaviour.

The behaviour is unstable, sometimes the bug doesn't occur, but most times it does.

The whole solution for the reproducer can be found here: https://bitbucket.org/k2fem/bug_intel_mkl_symv/src/master/
To reproduce the bug, build and run the bug_mkl project. The problem manifests itself in both Debug and Release configurations.


This, of course, introduces unexpected behaviour in my application.
Specifically, I had roughly this code:
```
call dsymv(A, B)
ind = [1:6]
```

When I looked into assembly of the Release build (with -O2), the compiler decided to prepare a part of the contents of ind - [1:4] - to the xmm7 register before calling symv.
After that, it would do:
ind(1:4) <- xmm7
ind(5) = 5
ind(6) = 6

Of course, at this point, xmm7 was already corrupted, which made my indices corrupted, which made my application crash.


Can you please look into this?

All of this also begs the question: does Intel test its libraries on AMD processors? They already make up 1/3 of the market share, so can no longer be brushed off as irrelevant.
Our applications need to work reliably on both vendors.


Also, I had to retype this whole bug report, because after clicking the Post button for the first time, I recieved an error and lost the whole thing!
This is not the first time this happened (the first was years ago)!

I would have been wiser and saved the message elsewhere, but since the last time I reported a bug, you added the "autosave functionality here", which I expected to work.
However, that doesn't work either! It only offered me an autosaved version from 30 minutes ago, when I barely started typing the message.

Intel, if even your bug reporting tool is buggy, I'm not sure there's any helping you...

0 Kudos
2 Replies
Chao_Y_Intel
Moderator
185 Views

Hi, Can you please check  the new oneMKL 2025.2 release?   We fixed a known issue on this:  Some BLAS and LAPACK functions may encounter runtime errors on AMD hardware in Windows*. The fix is available starting with oneMKL version 2025.0.1.

thanks,
Chao

0 Kudos
michaelkonecny
New Contributor I
124 Views

Hi,

I checked on oneAPI 2025.2 on AMD EPYC 7502P and I wasn't able to reproduce the error.
So it looks like the issue has been fixed.

However, the other bug I reported ( [BUG] access violation after dsymv call when debugging on AMD CPU ) still persists.

Thank you,
Michael

0 Kudos
Reply