- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I try using __svml_sin2 in inline ASM like the way compiler does. A code snippet as following,
"vmovupd (%1), %%ymm0\n\t"
"call __svml_sin4\n\t"
"vmovupd %%ymm0, (%0)\n\t"
"sub $1, %%rax\n\t"
"jnz 3b\n\t"
The program can build. But, the running output values are wrong.
Then I use GDB to locate the problem. It seems that, the SVMLfunction __svml_sin4 uses the general registers rax,rbx,rcx,rdx and so on,without save the scene. So I want to save the registers modified by SVML myself. The problem is, I do not know exactly which registers are modified. Maybe different SVML function use different registers.
So, anybody knows how to use the svml in inline assembly correctly?
thanks in advance for any answer.
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to x86-64 ABI (http://www.x86-64.org/documentation/abi.pdf, section 3.2.1), only rbp, rbx and r12-r15 general purpose registers need to be preserved by the called function. All other general purpose registers can be clobbered. I believe, this is applied to all UNIX-like systems.
The convention on Windows is summarized here: http://msdn.microsoft.com/en-us/library/ms235286.aspx
If you need to preserve values of clobbered registers you should save and restore them around the function call.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, andysem!
Thank you for your answer. It is very helpful.
Now the problem is that I do not know which registers are clobbered. So, if need to preserve the scence, I must save all the registers except rbp, rbx and r12-r1. It seem to be too expensive! Do you have any idea about that?
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have to assume that any registers that are not required to be preserved can be clobbered. You don't have to save all registers, only those having sensible data for your program (i.e. the calling function). Compilers usually store a shadow copy of variables on the stack so that the values can be saved and restored when needed. Minimizing and scheduling these moves is one of optimizations compilers perform that you'll have to do manually in the assembler code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
zhang y.,
Try this. It works with me in MinGW64 and Windows.
extern "C" __m256d __svml_sin4(const __m256d &a);
__inline __m256d sin(const __m256d &a)
{
__m256d ret;
__asm volatile
(
"vmovaps %1, %%ymm0\n"
// "push %%rax\n"
// "push %%rax\n"
"call __svml_sin4\n"
// "pop %%rax\n"
// "pop %%rax\n"
"vmovaps %%ymm0, %0\n"
: "=m"(ret) : "m"(a) : "%xmm0"
);
return ret;
}
__m256d src, ret;
ret = sin(src);
If something's wrong, try uncomment push/pop.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page