- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compiler: Intel icc v11.1.056 (flags -xO -O2 -fasm-blocks)
Platform: Linux Intel64 (Debian 5.0, gcc 4.3.2)
The icc compiler generates this code:
Defining the variable r as 'register double r asm ("xmm1");' results in pure SIMD code, but with some unnecessary 'register shuffling':
For the example #3 GNU gcc again generates the optimal code with or without register constraint (icc only without).
Conclusion:
The Intel C++ compiler cannot handle the register output from GNU-style inline assembly in an efficient way. It always uses an intermediate storage (stack etc.) to save and restore such output and does not pass the registers to other code blocks in a smart way.
GNU gcc can handle inline assembly with register output much better, resulting in faster and smaller code.
Platform: Linux Intel64 (Debian 5.0, gcc 4.3.2)
[cpp]// Example #1 (very simple function in pure C): double test_pure_c (double u) { register double r; r = u * u; r += u; r *= r; return r; } [/cpp]As expected, the icc compiler generates the optimal code:
[plain] movaps %xmm0, %xmm1 mulsd %xmm0, %xmm1 addsd %xmm1, %xmm0 mulsd %xmm0, %xmm0 ret [/plain]But now we replace the first and last calculation with GNU-style inline assembly:
[cpp]// Example #2: double test_asm_sse2 (double u) { register double r; // r = u * u; __asm__ ( " movaps %1, %0 n" " mulsd %0, %0 " : "=&x" : "x" (u) ); r += u; // r *= r; __asm__ ( " mulsd %0, %0 " : "+x" ); return r; } [/cpp]
The icc compiler generates this code:
[plain] movsd %xmm0, -40(%rsp) fldl -40(%rsp) fstl -24(%rsp) movsd -24(%rsp), %xmm1 movaps %xmm1, %xmm2 mulsd %xmm2, %xmm2 movsd %xmm2, -32(%rsp) fldl -32(%rsp) faddp %st, %st(1) fstpl -8(%rsp) movsd -8(%rsp), %xmm3 mulsd %xmm3, %xmm3 movsd %xmm3, -16(%rsp) movsd -16(%rsp), %xmm0 ret [/plain]
This is really stupid mixture of SIMD & x87 floating-point and unnecessary storing of intermediates on the stack. GNU gcc has no problems 'fitting' the inline assembly with its own code and generates the optimal code for examples #1 and #2.
Defining the variable r as 'register double r asm ("xmm1");' results in pure SIMD code, but with some unnecessary 'register shuffling':
[plain] movsd %xmm0, %xmm1 mulsd %xmm1, %xmm1 # why the register shuffling below ? movaps %xmm1, %xmm2 addsd %xmm0, %xmm2 movaps %xmm2, %xmm1 # the three instructions above can be replaced by # a single 'addsd %xmm0, %xmm1' mulsd %xmm1, %xmm1 movaps %xmm1, %xmm0 ret [/plain]
The compiler has a similar problem for variables declared with asm register constraints:
[cpp]// Example #3: long test_asm_reg_constraint (long c) { register long r asm ("rax"); r = c * c; r += 2; r *= r; r += c; return r; } [/cpp]
The icc compiler generates this silly code:
[plain] movq %rdi, %rdx imulq %rdi, %rdx movq %rdx, %rax lea 2(%rax), %rcx movq %rcx, %rax movq %rax, %rsi imulq %rax, %rsi movq %rsi, %rax lea (%rax,%rdi), %r8 movq %r8, %rax ret [/plain]
For the example #3 GNU gcc again generates the optimal code with or without register constraint (icc only without).
Conclusion:
The Intel C++ compiler cannot handle the register output from GNU-style inline assembly in an efficient way. It always uses an intermediate storage (stack etc.) to save and restore such output and does not pass the registers to other code blocks in a smart way.
GNU gcc can handle inline assembly with register output much better, resulting in faster and smaller code.
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the problem report and the examples. We will look into this issue and give you an update.
Regards,
--mark
Regards,
--mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The issue in example #2 has been resolved in the Intel C++ Composer XE Update 7 Build11 Oct 2011. For example #3, could you please explainthe reason you are using the register variable feature and its importance to you?
Thanks,
--mark
Thanks,
--mark
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page