Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Can I avoid usage of movdqa and movaps on stack variables with -O2?

djunglas
New Contributor I
1,163 Views
Hi,

we use icc 12.1 on x86-32 Linux. When compiling our code with -O2 many instructions like this are generated
[bash]movdqa XMMWORD PTR [esp+0xe0],xmm0 movaps XMMWORD PTR [esp+48],xmm7[/bash] Since the arguments to movdqa and moaps must be aligned to 16 bytes. This obviously requires that the stack is aligned to 16 bytes.
When compiling our code into a binary everything works fine. It seems like the compiler takes care of proper alignment of the stack?
However, we also compile our code into a shared object which is loaded into a Java virtual machine. Our code is then called through JNI and frequently crashes because the stack is not aligned to 16 bytes when the instruction is executed. The misaligned access results in a SIGSEGV.
The problem seems to go away when using -O1 instead of -O2, and in fact, the crashing function no longer contains movdqa/movaps in that case.
We also link object files generated with 'icc -O2' to code that is compiled and used by code compiled with g++ (an old version of g++ that does not have -mrealignstack). There the same problem could potentially arise.
Is there any way to compile with -O2 but force icc to not assume that the stack is aligned. So that we don't get the instructions that require stack alignment but still can benefit from -O2?
If not, is there a way to force the compiler to generate a sort of prologue for functions that would ensure proper stack alignment?
Or is there another way to make sure the stack is aligned properly when the offending instructions are executed?

Thanks a lot

Daniel
0 Kudos
1 Solution
Georg_Z_Intel
Employee
1,163 Views
Hello,

yes, I think that's the way to go here. I'd propose to test "-falign-stack=maintain-16-byte", though. The reason is that if the stack should be, for whatever reason, already 16 byte aligned the compiler can take advantage without falling back to enforced unaligned access in that case.

I don't have a JNI example at hand right now but I'd like to hear whether "-falign-stack=maintain-16-byte" works for you as well.

Best regards,

Georg Zitzlsberger

View solution in original post

0 Kudos
3 Replies
djunglas
New Contributor I
1,163 Views
It seems like add '-falign-stack=assume-4-byte' to the compiler options cures the problem.
Is this expected? Is this the right thing to do?

Thanks,

Daniel
0 Kudos
Georg_Z_Intel
Employee
1,164 Views
Hello,

yes, I think that's the way to go here. I'd propose to test "-falign-stack=maintain-16-byte", though. The reason is that if the stack should be, for whatever reason, already 16 byte aligned the compiler can take advantage without falling back to enforced unaligned access in that case.

I don't have a JNI example at hand right now but I'd like to hear whether "-falign-stack=maintain-16-byte" works for you as well.

Best regards,

Georg Zitzlsberger
0 Kudos
djunglas
New Contributor I
1,163 Views
Thanks a lot for tip!
-falign-stack=maintain-16-byte seems to work as well. All the test cases that crashed before now pass (just as they did with assume-4-byte). We will go with maintain-16-byte then.

Thank you again,

Daniel
0 Kudos
Reply