Missed optimization opportunity

milian · ‎05-28-2014

Hey all,

I just noticed a missed performance opportunity:

#include <cmath>
int main()
{
  const double C[2] = { 1., 0. };
  double v = C[1]  * cos(.42 * argc);
  return v;
}

This will call cos even though the result will always be 0. Now compare this to this code:

#include <cmath>
int main()
{
  double v = 0. * cos(.42 * argc);
  return v;
}

Here, the call to cos is properly optimized away. This is what I'd expect from the first code as well.

KitturGanesh · ‎05-28-2014

Hi Milian,

Yes, I checked it does call cos. Let me file this issue with our developers and will keep you updated accordingly, appreciate much.

Regards,
Kittur

Bernard · ‎05-30-2014

It seems that reason for not eliminating cos call was probably array definition.I suppose that compiler should have been able to calculate double v value at compile time.

.

KitturGanesh · ‎06-02-2014

That's what I think could have happenned and I'll update as soon as I have the info from the developer who'll be looking into this issue I've filed, thanks.

_Kittur

Bernard · ‎06-03-2014

@milian

Can you post the disassembly of both code versions?

Bernard · ‎06-03-2014

This is strange issue because array is allocated and initialized that's mean known at compile time. So this could be some bug in compiler optimization logic. As I cannot see disassembled version I suppose that probably push offset instruction(probably machine code representation of C[1]) if present could confuse the compiler.

milian · ‎06-03-2014

Sure, here you go:

/tmp$ cat test.cpp 
#include 

int main(int argc, char** argv)
{
  double v = 0. * cos(.42 * argc);
  return v;
}
/tmp$ icpc -S -O3 -std=c++11 ./test.cpp 
/tmp$ cat test.s
# mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.0.080 Build 20130";
# mark_description "728";
# mark_description "-S -O3 -std=c++11";
        .file "test.cpp"
        .text
..TXTST0:
# -- Begin  main
# mark_begin;
       .align    16,0x90
        .globl main
main:
# parameter 1: %edi
# parameter 2: %rsi
..B1.1:                         # Preds ..B1.0
..___tag_value_main.1:                                          #4.1
        pushq     %rbp                                          #4.1
..___tag_value_main.3:                                          #
        movq      %rsp, %rbp                                    #4.1
..___tag_value_main.4:                                          #
        andq      $-128, %rsp                                   #4.1
        subq      $128, %rsp                                    #4.1
        movq      $0x000000000, %rsi                            #4.1
        movl      $3, %edi                                      #4.1
        call      __intel_new_feature_proc_init                 #4.1
                                # LOE rbx r12 r13 r14 r15
..B1.4:                         # Preds ..B1.1
        stmxcsr   (%rsp)                                        #4.1
        xorl      %eax, %eax                                    #6.10
        orl       $32832, (%rsp)                                #4.1
        ldmxcsr   (%rsp)                                        #4.1
        movq      %rbp, %rsp                                    #6.10
        popq      %rbp                                          #6.10
..___tag_value_main.6:                                          #
        ret                                                     #6.10
        .align    16,0x90
..___tag_value_main.8:                                          #
                                # LOE
# mark_end;
        .type   main,@function
        .size   main,.-main
        .data
# -- End  main
        .data
        .section .note.GNU-stack, ""
// -- Begin DWARF2 SEGMENT .eh_frame
        .section .eh_frame,"a",@progbits
.eh_frame_seg:
        .align 8
        .4byte 0x0000001c
        .8byte 0x00507a0100000000
        .4byte 0x09107801
        .byte 0x00
        .8byte __gxx_personality_v0
        .4byte 0x9008070c
        .2byte 0x0001
        .byte 0x00
        .4byte 0x00000034
        .4byte 0x00000024
        .8byte ..___tag_value_main.1
        .8byte ..___tag_value_main.8-..___tag_value_main.1
        .2byte 0x0400
        .4byte ..___tag_value_main.3-..___tag_value_main.1
        .2byte 0x100e
        .byte 0x04
        .4byte ..___tag_value_main.4-..___tag_value_main.3
        .4byte 0x8610060c
        .2byte 0x0402
        .4byte ..___tag_value_main.6-..___tag_value_main.4
        .8byte 0x00000000c608070c
        .byte 0x00
# End

And for the other version:

/tmp$ cat test.cpp
#include 

int main(int argc, char** argv)
{
  const double C[2] = { 1., 0. };
  double v = C[1] * cos(.42 * argc);
  return v;
}
/tmp$ icpc -S -O3 -std=c++11 ./test.cpp    
/tmp$ cat test.s
# mark_description "Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.0.080 Build 20130";
# mark_description "728";
# mark_description "-S -O3 -std=c++11";
        .file "test.cpp"
        .text
..TXTST0:
# -- Begin  main
# mark_begin;
       .align    16,0x90
        .globl main
main:
# parameter 1: %edi
# parameter 2: %rsi
..B1.1:                         # Preds ..B1.0
..___tag_value_main.1:                                          #4.1
        pushq     %rbp                                          #4.1
..___tag_value_main.3:                                          #
        movq      %rsp, %rbp                                    #4.1
..___tag_value_main.4:                                          #
        andq      $-128, %rsp                                   #4.1
        pushq     %r12                                          #4.1
        subq      $120, %rsp                                    #4.1
..___tag_value_main.6:                                          #
        movl      %edi, %r12d                                   #4.1
        movq      $0x000000000, %rsi                            #4.1
        movl      $3, %edi                                      #4.1
        call      __intel_new_feature_proc_init                 #4.1
                                # LOE rbx r13 r14 r15 r12d
..B1.5:                         # Preds ..B1.1
        stmxcsr   16(%rsp)                                      #4.1
        movq      $0x3ff0000000000000, %rax                     #5.21
        orl       $32832, 16(%rsp)                              #4.1
        ldmxcsr   16(%rsp)                                      #4.1
        cvtsi2sd  %r12d, %xmm0                                  #6.21
        mulsd     .L_2il0floatpacket.2(%rip), %xmm0             #6.21
        movq      %rax, (%rsp)                                  #5.21
        movq      $0, 8(%rsp)                                   #5.21
        call      cos                                           #6.21
                                # LOE rbx r13 r14 r15 xmm0
..B1.4:                         # Preds ..B1.5
        mulsd     8(%rsp), %xmm0                                #6.21
        cvttsd2si %xmm0, %eax                                   #7.10
        addq      $120, %rsp                                    #7.10
..___tag_value_main.7:                                          #7.10
        popq      %r12                                          #7.10
        movq      %rbp, %rsp                                    #7.10
        popq      %rbp                                          #7.10
..___tag_value_main.8:                                          #
        ret                                                     #7.10
        .align    16,0x90
..___tag_value_main.10:                                         #
                                # LOE
# mark_end;
        .type   main,@function
        .size   main,.-main
        .data
# -- End  main
        .section .rodata, "a"
        .align 8
        .align 8
.L_2il0floatpacket.2:
        .long   0xae147ae1,0x3fdae147
        .type   .L_2il0floatpacket.2,@object
        .size   .L_2il0floatpacket.2,8
        .align 8
.L_2il0floatpacket.3:
        .long   0x00000000,0x3ff00000
        .type   .L_2il0floatpacket.3,@object
        .size   .L_2il0floatpacket.3,8
        .data
        .section .note.GNU-stack, ""
// -- Begin DWARF2 SEGMENT .eh_frame
        .section .eh_frame,"a",@progbits
.eh_frame_seg:
        .align 8
        .4byte 0x0000001c
        .8byte 0x00507a0100000000
        .4byte 0x09107801
        .byte 0x00
        .8byte __gxx_personality_v0
        .4byte 0x9008070c
        .2byte 0x0001
        .byte 0x00
        .4byte 0x0000004c
        .4byte 0x00000024
        .8byte ..___tag_value_main.1
        .8byte ..___tag_value_main.10-..___tag_value_main.1
        .2byte 0x0400
        .4byte ..___tag_value_main.3-..___tag_value_main.1
        .2byte 0x100e
        .byte 0x04
        .4byte ..___tag_value_main.4-..___tag_value_main.3
        .4byte 0x8610060c
        .2byte 0x0402
        .4byte ..___tag_value_main.6-..___tag_value_main.4
        .8byte 0xff800d1c380e0c10
        .8byte 0xfffffff80d1affff
        .2byte 0x0422
        .4byte ..___tag_value_main.7-..___tag_value_main.6
        .2byte 0x04cc
        .4byte ..___tag_value_main.8-..___tag_value_main.7
        .4byte 0xc608070c
        .byte 0x00
# End

Bernard · ‎06-04-2014

In the first code case compiler completely optimized the code by realizing that it will not be used later simply by calculating everything in the compile time. I suppose that xorl %eax,%eax is the return value.

Bernard · ‎06-04-2014

In second case compiler has all the needed variables available at compile time, but decided to emit the code which will calculate the result at run time.I suppose that this array member L_2il0floatpacket.2 was responsible for not performing optimization.

KitturGanesh · ‎06-04-2014

Hi folks,

I've passed on your feedback in the issue I've filed with developers and will keep you updated when I get more info on this, thanks

_Kittur

Bernard · ‎06-05-2014

@Kittur

Thank you.

jimdempseyatthecove · ‎06-05-2014

milian wrote:

Hey all,

I just noticed a missed performance opportunity:
#include <cmath>
int main()
{
  const double C[2] = { 1., 0. };
  double v = C[1]  * cos(.42 * argc);
  return v;
}
This will call cos even though the result will always be 0. Now compare this to this code:
#include <cmath>
int main()
{
  double v = 0. * cos(.42 * argc);
  return v;
}
Here, the call to cos is properly optimized away. This is what I'd expect from the first code as well.

Thought the result in v won't differ from 0.0 with the removal of cos, the function call on the statement could potentially have a side effect (in this case it does not).

When it is unknown as to side effects of a function call, then the call must be made. cos should be known by the compiler as not having side effects, so in this case the statement could be eliminated.

Assume for example cos were foo, and assume foo accumulated the inputs (e.g. summation), by omitting the statement v= you would also omit the accumulation of the argument into the sum.

Jim Dempsey

milian · ‎06-05-2014

I know about that, but do note that cos and similar math functions should be marked as const/pure (cmp. [1]) and thus the optimization can be done by icpc as it uses the -ffast-math like behavior of GCC and Clang by default. Without the latter btw you couldn't even apply this optimization as the result could be any of , -0, +0, NaN, ...

TimP · ‎06-05-2014

icc doesn't go as far as gcc -ffast-math (-ffinite-math-only) in performing optimizations which require neglecting possibility of NaN. Some optimizations enabled by gcc -ffast-math are done by icc #pragma vector always and the like (so are done by icc only when vectorizing).

KitturGanesh · ‎06-05-2014

That's correct Tim that's my understanding too....