Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7954 Discussions

Optimization reorders instructions incorrectly

Pavel_J_1
Beginner
738 Views
// Example code in external library:
struct Vec
{
    double x, y, z;

    inline const Vec& operator=(const Vec& that)
    {
        x = that.x;
        y = that.y;
        z = that.z;
        return *this;
    }

    inline double& operator[](int index)
    {
        return (&x)[index];
    }
}

// Example usage in my binary:

int index;
Vec vec;

...

vec = some_other_vec;
double result = vec[index];

If the value of index is not known at compile time, compiler reorders instruction incorrectly, for instance like this (assuming index == 2):

vec.x = some_other_vec.x;
result = (&vec.x)[2];    // this is loaded before vec.z is stored
vec.y = some_other_vec.y;
vec.z = some_other_vec.z;

Implementation of operator[] is probably violates strict aliasing rule. But it is in external library and I cannot modify the source code. I decided to build my binary with -fno-strict-aliasing. Operator [] is inlined to my binary, so it should fix it. But it seems, that -fno-strict-aliasing does not work. What flag could I use to disable incorrect instructions reordering? When I compile my code with -O0, everything works correctly.

I use this compiler: icpc version 15.0.1 (gcc version 4.6.0 compatibility), previous version of compiler works ok

Thank you, Pavel Jurkas.

 

0 Kudos
11 Replies
jimdempseyatthecove
Honored Contributor III
738 Views

Can you insert a compiler fence in between the instructions?

If the header is available, then you could add an operator= which includes the fence.

This assumes the error does not appear within the external library.

Jim Dempsey

0 Kudos
Anoop_M_Intel
Employee
738 Views

I am not able to reproduce the error using the following program (used 15.0,1 + GCC 4.6):

#include<iostream>
#include<stdlib.h>
struct Vec
{
    double x, y, z;
    inline Vec()
    {
        x = -1.0;
        y = -2.0;
        z = -3.0;
    }
    inline Vec(double x1, double y1, double z1)
    {
        x = x1;
        y = y1;
        z = z1;
    }

    inline const Vec& operator=(const Vec& that)
    {
        x = that.x;
        y = that.y;
        z = that.z;
        return *this;
    }

    inline double& operator[](int index)
    {
        return (&x)[index];
    }
};
int main(int argc, char *argv[]){
        if(argc != 2)
        {
                std::cout<<"Need the index as a command line argument [0, 1 or 2]\n";
                return 0;
        }
        int index = atoi(argv[1]);
        Vec v1(1.0, 2.0, 3.0), v2;
        v2 = v1;
        double result = (&(v2.x))[index];
        std::cout<<"result = "<<result<<"\n";
        return 0;
}

Please correct me if I am not capturing your problem right.

Thanks and Regards
Anoop

0 Kudos
Pavel_J_1
Beginner
738 Views

Could you please try this example code (test.cpp):

#include <iostream>

struct Vec
{
    double x, y, z;

    const Vec& operator=(const Vec& that)
    {
        x = that.x;
        y = that.y;
        z = that.z;
        return *this;
    }

    double& operator[](int index)
    {
        return (&x)[index];
    }
};

int main()
{
    const int size = 1000;

    // create 1000 vectors: (0,0,0), (1,1,1), ..., (999,999,999)
    Vec vec[size];
    for (int i = 0; i < size; ++i)
        vec.x = vec.y = vec.z = i;

    // get index from user
    int index;
    std::cout << "enter index [0, 1, or 2]: ";
    std::cin >> index;

    // compute sum of the specified index from all vectors
    Vec v;
    double sum = 0.0;
    for (int i = 0; i < size; ++i)
    {
        v = vec;
        sum += v[index];
    }

    // output result -- it should return always value 499500
    std::cout << "result: " << sum << std::endl;
}

Some results:

$ g++ -O3 test.cpp && ./a.out
enter index [0, 1, or 2]: 2
result: 499500

$ icpc -O0 test.cpp && ./a.out
enter index [0, 1, or 2]: 2
result: 499500

$ icpc-2013_sp1.1.106 -O1 test.cpp && ./a.out
enter index [0, 1, or 2]: 2
result: 499500

$ icpc-2013_sp1.2.144 -O1 test.cpp && ./a.out
enter index [0, 1, or 2]: 2
result: 499500

$ icpc-2015.1.133 -O1 test.cpp && ./a.out
enter index [0, 1, or 2]: 2
result: 498501

$ icpc-2015.3.187 -O1 test.cpp && ./a.out
enter index [0, 1, or 2]: 2
result: 498501

$ icpc-2016.0.109 -O1 test.cpp && ./a.out
enter index [0, 1, or 2]: 2
result: 498501

$ icpc-2016.3.210 -O1 test.cpp && ./a.out
enter index [0, 1, or 2]: 2
result: 0

I tried all versions of icpc, which I have. Versions >= 2015 returned incorrect values.

Memory fences fixes this issue, but i do not want to modify that external code, i would rather disable some optimization, which reoders it incorrectly.

Thank you, Pavel Jurkas

 

0 Kudos
Pavel_J_1
Beginner
738 Views

One more question: Is that code incorrect or is it a bug in compiler?

Thanks

0 Kudos
Anoop_M_Intel
Employee
738 Views

I have escalated this problem to compiler engineering for expert advise. In the meantime, as Jim mentioned you can use MFENCE as shown below:

#include <iostream>

struct Vec
{
    double x, y, z;

    const Vec& operator=(const Vec& that)
    {
        x = that.x;
        y = that.y;
        z = that.z;
        return *this;
    }

    double& operator[](int index)
    {
        return (&x)[index];
    }
};

int main()
{
    const int size = 1000;

    // create 1000 vectors: (0,0,0), (1,1,1), ..., (999,999,999)
    Vec vec[size];
    for (int i = 0; i < size; ++i)
        vec.x = vec.y = vec.z = i;

    // get index from user
    int index;
    std::cout << "enter index [0, 1, or 2]: ";
    std::cin >> index;

    // compute sum of the specified index from all vectors
    Vec v;
    double sum = 0.0;
    for (int i = 0; i < size; ++i)
    {
        v = vec;
        asm volatile ("mfence" ::: "memory");
        sum += v[index];
    }

    // output result -- it should return always value 499500
    std::cout << "result: " << sum << std::endl;
}
$ icpc test.cc -g -O1
$ ./a.out
enter index [0, 1, or 2]: 2
result: 499500

$ g++ test.cc -O3
$ ./a.out
enter index [0, 1, or 2]: 2
result: 499500

Thanks and Regards
Anoop

0 Kudos
jimdempseyatthecove
Honored Contributor III
738 Views

Anoop,

I think

     asm volatile("" ::: "memory"); // compiler fence only (no CPU fence required)

would be sufficient. Could you please verify this?

Jim Dempsey

0 Kudos
Pavel_J_1
Beginner
738 Views

Thank you, asm volatile("" ::: "memory") works for me.

Is this problem caused by a bug in Intel compiler?

0 Kudos
jimdempseyatthecove
Honored Contributor III
738 Views

>>Is this problem caused by a bug in Intel compiler?

I think it is a case of a bug in the programmer. The programmer (you) told the compiler "I want speed". An almost universal means (one of) of accomplishing this is by instruction re-order *** provided it does not bypass the visible intent of the programmer expressed in source code. The aliasing used in your code makes the intent NOT visible.

For additional information see: http://preshing.com/20120625/memory-ordering-at-compile-time/

Jim Dempsey

0 Kudos
Pavel_J_1
Beginner
738 Views

Thank you for a nice explanation. I suspected more that external library code of being incorrect than the compiler. But some of my colleagues were convinced that it is a compiler bug. 

0 Kudos
jimdempseyatthecove
Honored Contributor III
738 Views

I assume you are aware that the double& on the operator[] is producing the alias. If you change the return type to double (no &, and return the value not the reference), the compiler may generate the code you expect (I have not tested this).

Jim Dempsey

0 Kudos
Pavel_J_1
Beginner
738 Views

Non-alias version produces the same incorrect result

0 Kudos
Reply