- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When running a pardiso() Numerical factorization phase operation on Windows (64 bit) the some XMM registers are not preserved, that should be. Specifically with iparm[23] Parallel factorization control set to:
0: XMM9-15 where not preserved,
1: XMM14-15 where not preserved,
10: seems to preserve all the XMM registers.
This is only what I observed as changes while debug one dataset, not sure if other registers are not preversed if pardiso gets called with other input or settings.
According to Microsoft's default x64 calling convention, the registers RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15 and XMM6–XMM15 need to be preserved by the callee. And it seems Microsofts C++ compiler relies on this for optimizations and vectorization.
(Environment: Windows 11 24H2, Visual Studio 2022 (17.11.2), OneApi Base Toolkit 2024.2)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Robert,
Thank you for raising the issue. We will thoroughly investigate and analyze it. Could you please provide us with a reproducible sample code and the steps that how you observe the issue?
Regards,
Ruqiu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
please find in the attachments the reproducible sample code. The zip file contains a Visual Studio 2022 solution, project and main code in PardisoBug.cpp (sorry about the dumped number arrays). If oneAPI is installed in the usual place in should compile; otherwise you might need to adapt include and link locations. You'll need to put a copy of the the 64-bit version of libiomp5md.dll into the target folders to run, though.
The issue arises in PardisoBug.cpp line 286:
pardiso(pt, &maxfct, &mnum, &mtype, &FACTORIZATION, &numParm, normalMatrix, normalRowIndex, normalColumns, perm, &one, iparm, &msglvl, rhs, solution, &error);
After this call, the registers xmm6-15 are filled with different values than before. If the code on the caller side would use those registers for vectorization, that would probably cause problems.
Using the intrinsic commands _fxsave(...) and _fxrstor(...) to store the register values before and restore them after the pardiso call, the issue can be worked around:
#include <immintrin.h>
int main()
{
...
alignas(16) unsigned char xmmRegisterStorage[512];
_fxsave(xmmRegisterStorage);
pardiso(pt, &maxfct, &mnum, &mtype, &FACTORIZATION, &numParm, normalMatrix, normalRowIndex, normalColumns, perm, &one, iparm, &msglvl, rhs, solution, &error);
_fxrstor(xmmRegisterStorage);
...
}
But it is not an ideal solution, the pardiso call should adhere to the x64 calling convention on Windows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Robert for sharing the reproducer. Looks it works well on my Desktop with latest oneMKL version 2024.2.1, no matter compiled by Intel DPC++/C++ compiler or MSVC.
Can you verify it again in you side with latest oneMKL version? You can standalone download oneMKL package here:https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html. If the issue still exist, please tell us more information for your system, for example, CPU hardware, compiler version, etc.
Regards,
Ruqiu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, thanks for the reply.
I'm using:
* Visual Studio 2022 (17.11.3) with Microsoft (R) C/C++ Optimizing Compiler Version 19.41.34120 for x64,
* Windows 11 24H2, and
* oneAPI Base Toolkit 2024.2.1, also tried the oneMKL installer, didn't change anything.
I tested a build (of a slightly extended version of the sample, to get some output) on different machines of friends and colleagues. It looks like the issue is related to AMD processors; all Intel processors where doing the right thing.
My system has a 'AMD Ryzen Threadripper PRO 7955WX 16-Cores' processor. Machines that also show the same effect (actually the exact same values in the xmm6-15 registers) were 'AMD EPYC 73F3 16-Cores' and 'AMD Ryzen Threadripper PRO 3945WX 12-Cores'.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Robert,
Thank you for the updates that the issue only happens on AMD processors. We will work on AMD processors internally, and will update this post as we make progress.
Regards,
Ruqiu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Robert,
There's no assembly kernels in PARDISO, and noted the issue happen with Microsoft compiler. Have you tried Intel compiler in oneAPI Base Toolkit 2024.2?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
not sure what you're trying to say by mentioning "assembly kernels". What are those and how do they relate to this issue?
Yes, this happens with Microsoft's compilers, so what? There is a library function in a DLL that does not adhere to calling conventions for x64 on Windows (for certain CPUs). The compiler is allowed to optimize the code around the call to the function, storing values in registers, that, based on the calling convention, should be preserved by the called function. If the compiler used to compile the calling code is changed, a function that breaks calling conventions would still break them, unless the called function is doing some dynamic shenanigans to deal with different caller compilers and/or OSs.
And no, I didn't try Intel compiler yet, as the issue is showing up in a quite big software, which is suppose to be compiled with Microsoft's compiler. Nothing I can change on a whim. But I'll try it on the small sample, to see if it really does make a difference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay, I just checked the reproducer with intel compiler. Same effect. XMM9-15 are not preserved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Robert,,
We reproduced the issue on AMD CPU path. Internal fixing on AMD CPUs is on-going.
Thank you again for posting your concern in the forum. We will fix it in the future release. We are closing and will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread.
Regards,
Ruqiu

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page