- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey all,
I just noticed a missed performance opportunity:
#include <cmath> int main() { const double C[2] = { 1., 0. }; double v = C[1] * cos(.42 * argc); return v; }
This will call cos even though the result will always be 0. Now compare this to this code:
#include <cmath> int main() { double v = 0. * cos(.42 * argc); return v; }
Here, the call to cos is properly optimized away. This is what I'd expect from the first code as well.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Milian,
Yes, I checked it does call cos. Let me file this issue with our developers and will keep you updated accordingly, appreciate much.
Regards,
Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems that reason for not eliminating cos call was probably array definition.I suppose that compiler should have been able to calculate double v value at compile time.
.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's what I think could have happenned and I'll update as soon as I have the info from the developer who'll be looking into this issue I've filed, thanks.
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@milian
Can you post the disassembly of both code versions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is strange issue because array is allocated and initialized that's mean known at compile time. So this could be some bug in compiler optimization logic. As I cannot see disassembled version I suppose that probably push offset instruction(probably machine code representation of C[1]) if present could confuse the compiler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sure, here you go:
/tmp$ cat test.cpp #includeint main(int argc, char** argv) { double v = 0. * cos(.42 * argc); return v; } /tmp$ icpc -S -O3 -std=c++11 ./test.cpp /tmp$ cat test.s # mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.0.080 Build 20130"; # mark_description "728"; # mark_description "-S -O3 -std=c++11"; .file "test.cpp" .text ..TXTST0: # -- Begin main # mark_begin; .align 16,0x90 .globl main main: # parameter 1: %edi # parameter 2: %rsi ..B1.1: # Preds ..B1.0 ..___tag_value_main.1: #4.1 pushq %rbp #4.1 ..___tag_value_main.3: # movq %rsp, %rbp #4.1 ..___tag_value_main.4: # andq $-128, %rsp #4.1 subq $128, %rsp #4.1 movq $0x000000000, %rsi #4.1 movl $3, %edi #4.1 call __intel_new_feature_proc_init #4.1 # LOE rbx r12 r13 r14 r15 ..B1.4: # Preds ..B1.1 stmxcsr (%rsp) #4.1 xorl %eax, %eax #6.10 orl $32832, (%rsp) #4.1 ldmxcsr (%rsp) #4.1 movq %rbp, %rsp #6.10 popq %rbp #6.10 ..___tag_value_main.6: # ret #6.10 .align 16,0x90 ..___tag_value_main.8: # # LOE # mark_end; .type main,@function .size main,.-main .data # -- End main .data .section .note.GNU-stack, "" // -- Begin DWARF2 SEGMENT .eh_frame .section .eh_frame,"a",@progbits .eh_frame_seg: .align 8 .4byte 0x0000001c .8byte 0x00507a0100000000 .4byte 0x09107801 .byte 0x00 .8byte __gxx_personality_v0 .4byte 0x9008070c .2byte 0x0001 .byte 0x00 .4byte 0x00000034 .4byte 0x00000024 .8byte ..___tag_value_main.1 .8byte ..___tag_value_main.8-..___tag_value_main.1 .2byte 0x0400 .4byte ..___tag_value_main.3-..___tag_value_main.1 .2byte 0x100e .byte 0x04 .4byte ..___tag_value_main.4-..___tag_value_main.3 .4byte 0x8610060c .2byte 0x0402 .4byte ..___tag_value_main.6-..___tag_value_main.4 .8byte 0x00000000c608070c .byte 0x00 # End
And for the other version:
/tmp$ cat test.cpp #includeint main(int argc, char** argv) { const double C[2] = { 1., 0. }; double v = C[1] * cos(.42 * argc); return v; } /tmp$ icpc -S -O3 -std=c++11 ./test.cpp /tmp$ cat test.s # mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.0.080 Build 20130"; # mark_description "728"; # mark_description "-S -O3 -std=c++11"; .file "test.cpp" .text ..TXTST0: # -- Begin main # mark_begin; .align 16,0x90 .globl main main: # parameter 1: %edi # parameter 2: %rsi ..B1.1: # Preds ..B1.0 ..___tag_value_main.1: #4.1 pushq %rbp #4.1 ..___tag_value_main.3: # movq %rsp, %rbp #4.1 ..___tag_value_main.4: # andq $-128, %rsp #4.1 pushq %r12 #4.1 subq $120, %rsp #4.1 ..___tag_value_main.6: # movl %edi, %r12d #4.1 movq $0x000000000, %rsi #4.1 movl $3, %edi #4.1 call __intel_new_feature_proc_init #4.1 # LOE rbx r13 r14 r15 r12d ..B1.5: # Preds ..B1.1 stmxcsr 16(%rsp) #4.1 movq $0x3ff0000000000000, %rax #5.21 orl $32832, 16(%rsp) #4.1 ldmxcsr 16(%rsp) #4.1 cvtsi2sd %r12d, %xmm0 #6.21 mulsd .L_2il0floatpacket.2(%rip), %xmm0 #6.21 movq %rax, (%rsp) #5.21 movq $0, 8(%rsp) #5.21 call cos #6.21 # LOE rbx r13 r14 r15 xmm0 ..B1.4: # Preds ..B1.5 mulsd 8(%rsp), %xmm0 #6.21 cvttsd2si %xmm0, %eax #7.10 addq $120, %rsp #7.10 ..___tag_value_main.7: #7.10 popq %r12 #7.10 movq %rbp, %rsp #7.10 popq %rbp #7.10 ..___tag_value_main.8: # ret #7.10 .align 16,0x90 ..___tag_value_main.10: # # LOE # mark_end; .type main,@function .size main,.-main .data # -- End main .section .rodata, "a" .align 8 .align 8 .L_2il0floatpacket.2: .long 0xae147ae1,0x3fdae147 .type .L_2il0floatpacket.2,@object .size .L_2il0floatpacket.2,8 .align 8 .L_2il0floatpacket.3: .long 0x00000000,0x3ff00000 .type .L_2il0floatpacket.3,@object .size .L_2il0floatpacket.3,8 .data .section .note.GNU-stack, "" // -- Begin DWARF2 SEGMENT .eh_frame .section .eh_frame,"a",@progbits .eh_frame_seg: .align 8 .4byte 0x0000001c .8byte 0x00507a0100000000 .4byte 0x09107801 .byte 0x00 .8byte __gxx_personality_v0 .4byte 0x9008070c .2byte 0x0001 .byte 0x00 .4byte 0x0000004c .4byte 0x00000024 .8byte ..___tag_value_main.1 .8byte ..___tag_value_main.10-..___tag_value_main.1 .2byte 0x0400 .4byte ..___tag_value_main.3-..___tag_value_main.1 .2byte 0x100e .byte 0x04 .4byte ..___tag_value_main.4-..___tag_value_main.3 .4byte 0x8610060c .2byte 0x0402 .4byte ..___tag_value_main.6-..___tag_value_main.4 .8byte 0xff800d1c380e0c10 .8byte 0xfffffff80d1affff .2byte 0x0422 .4byte ..___tag_value_main.7-..___tag_value_main.6 .2byte 0x04cc .4byte ..___tag_value_main.8-..___tag_value_main.7 .4byte 0xc608070c .byte 0x00 # End
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the first code case compiler completely optimized the code by realizing that it will not be used later simply by calculating everything in the compile time. I suppose that xorl %eax,%eax is the return value.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In second case compiler has all the needed variables available at compile time, but decided to emit the code which will calculate the result at run time.I suppose that this array member L_2il0floatpacket.2 was responsible for not performing optimization.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi folks,
I've passed on your feedback in the issue I've filed with developers and will keep you updated when I get more info on this, thanks
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Kittur
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
milian wrote:
Hey all,
I just noticed a missed performance opportunity:
#include <cmath> int main() { const double C[2] = { 1., 0. }; double v = C[1] * cos(.42 * argc); return v; }This will call cos even though the result will always be 0. Now compare this to this code:
#include <cmath> int main() { double v = 0. * cos(.42 * argc); return v; }Here, the call to cos is properly optimized away. This is what I'd expect from the first code as well.
Thought the result in v won't differ from 0.0 with the removal of cos, the function call on the statement could potentially have a side effect (in this case it does not).
When it is unknown as to side effects of a function call, then the call must be made. cos should be known by the compiler as not having side effects, so in this case the statement could be eliminated.
Assume for example cos were foo, and assume foo accumulated the inputs (e.g. summation), by omitting the statement v= you would also omit the accumulation of the argument into the sum.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I know about that, but do note that cos and similar math functions should be marked as const/pure (cmp. [1]) and thus the optimization can be done by icpc as it uses the -ffast-math like behavior of GCC and Clang by default. Without the latter btw you couldn't even apply this optimization as the result could be any of , -0, +0, NaN, ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
icc doesn't go as far as gcc -ffast-math (-ffinite-math-only) in performing optimizations which require neglecting possibility of NaN. Some optimizations enabled by gcc -ffast-math are done by icc #pragma vector always and the like (so are done by icc only when vectorizing).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's correct Tim that's my understanding too....

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page