Re: Optimization compiler bug in Intel DPC++ compiler 2024.* (and older)

Peter_Zajac · ‎07-22-2024

Hello,

I have come across a compiler bug related to optimization in the current icpx compiler. I can reproduce the error on the newest version 2024.2.0 as well as several older versions (2023.2.1). The error appeared pretty deep in our finite element simulation software, but I have been able to boil the bug down to a somewhat simple plain standard C++ example.

Consider the following piece of code, which effectively creates a 2x2 identity matrix and computes its determinant, which should be equal to 1, and returns the exit code 0 if the determinant is corrent and 1 otherwise. There's some mild template magic as well as a reinterpret_cast going on, but this seems to be necessary, because I failed to simplify the code so that the compiler still produces incorrect results:

#include <iostream>
#include <cmath>

template<typename T_, int n_, int s_>
class FooVector
{
public:
  T_ v[s_];

  T_& operator[](int i)
  {
    return v[i];
  }
}; // class FooVector

template<typename T_, int m_, int n_, int sm_ = m_, int sn_ = n_>
class FooMatrix
{
public:
  FooVector<T_, m_, sm_> v[sm_];

  void set_identity()
  {
    for(int i(0); i < m_; ++i)
      for(int j(0); j < n_; ++j)
        v[i][j] = (i == j ? T_(1) : T_(0));
  }
}; // class FooMatrix

static double determinant(const double (&a)[2][2])
{
  return a[0][0]*a[1][1] - a[0][1]*a[1][0];
}

int main()
{
  typedef double Tv[2][2];

  FooMatrix<double, 2, 2> jac_mat;
  double det = 0.0;

  for(int i(0); i < 1; ++i) // <-- comment out to make this work
  {
    jac_mat.set_identity();
    det = determinant(*reinterpret_cast<const Tv*>(&jac_mat));
  }

  printf("DET: %f\n", det);
  return int(std::abs(det - 1.0) > 1e-3);
}

The results produced by the compiled binary depent on the compiler flags chosen and on whether the 'for' statement in line 42 is commented out or not. I have set up this example along with four compiler configurations here: https://godbolt.org/z/qE4vd1jqP

If you don't want to use the website above, I used the following call to compile and run the code:

icpx -O3 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?

The summary is:

If compiled with -O0 or -O3, then we always get the correct results.

If compiled with -O3 -qopenmp, then the exit code is 0 (correct), but the determinant printed to cout is 0, although it should be 1.

If compiled with -O1 and the for statement in line 42 is not commented out, the exit code is 1 (wrong) and the determinant is 0 (wrong).

If compiled with -O1 and the for statement in line 42 is commented out, we get correct results.

All other compilers that I have tested, including the Intel C++ classic compilers (2021.10.0), always produced the correct results.

Alex_Y_Intel · ‎07-22-2024

I've escalated your issue for further investigation.

Alex_Y_Intel · ‎07-23-2024

I want to confirm with you after some testing; this is what I found--

ayu1@sdp4450:~/opt$ icpx -O0 -fiopenmp icpx-heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 1.000000
0
ayu1@sdp4450:~/opt$ icpx -O3 -fiopenmp icpx-heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
0

Here I don't see that both -O3 and -O0 produced the same results

ayu1@sdp4450:~/opt$ icpx -O3 -qopenmp icpx-heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
0
ayu1@sdp4450:~/opt$ icpx -O1 -fiopenmp icpx-heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 16662665897685812963617139039402195225075356588939957430612114731476344794801546603548301662579828488938878176837977696023349297152.000000
1

Here I commented out the for loop on line 42 as you mentioned, and this is what I got:

ayu1@sdp4450:~/opt$ icpx -O1 -fiopenmp icpx-heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
1

My compiler version:

ayu1@sdp4450:~/opt$ icx -V
Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

So it looks like only -O0 gave the correct result, can you please confirm this?

Peter_Zajac · ‎07-24-2024

Ok, firstly, here's my compiler version:

~/nobackup/test > icx -V
Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2024.0.2 Build 20231213

When I compile the original code (including the 'for' statement), I can confirm that -O1 and -O3 do not produce the same return code when compiled along with -fiopenmp, however, the determinant seems to be 0 instead of that seemingly random large number (uninitialized memory?) that you got:

~/nobackup/test > icpx -O0 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 1.000000
0
~/nobackup/test > icpx -O1 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
1
~/nobackup/test > icpx -O3 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
0

When I comment out the 'for' statement, then I presumably get correct results for -O1 -fiopenmp, but not for -O3 -fiopenmp:

~/nobackup/test > icpx -O0 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 1.000000
0
~/nobackup/test > icpx -O1 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 1.000000
0
~/nobackup/test > icpx -O3 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
0

I'm not sure whether that matters, but I have performed theses test on a small university compute server running RedHat EL 9.4.

Completely unrelated to the bug above, but made me scratch my head:

I had to remove the "Copyright (C) XXXX-YYYY Intel Corporation. All rights reserved." from the compiler version output above, because I got the error message "The message body contains XX-YYYY, which is not permitted in this community. Please remove this content before sending your post."... The only thing that I could find when searching for the presumably offending numbers were some motorcycles...

Alex_Y_Intel · ‎07-24-2024

Please try the 2024.2 version and let me know what you got.

Peter_Zajac · ‎07-24-2024

I'm sorry, but I do not have access to any actual machine with version 2024.2 installed; I have only tested this version on the compiler explorer website that I have referenced in my first post. Version 2024.0.2 is the newest version that I have at my disposal.

Alex_Y_Intel · ‎08-14-2024

In C++, the compiler is allowed to assume that data type X can only be accessed with pointer types X*, char*, and similar types such as unsigned X*. For example:

int *ip;

float *fp = reinterpret_cast<float *>(ip);

In the above example, the compiler may assume that ip and fp point to different data, because the pointers are different types. C++ has a strong typing system with inheritance, which the compiler relies on for optimizations.

The code in the "heisenbug" program contains a similar problem:

det = determinant(*reinterpret_cast<const Tv*>(&jac_mat));

The function determinant() is declared with double[2][2] type. The compiler will assume that the determinant() function will access different data than "jac_mat", which is FooMatrix type.

The solution is to change determinant() so it can accept a FooMatrix. FooMatrix can be improved to support subscripts.

#include <iostream>

#include <cmath>

template<typename T_, int n_, int s_>

class FooVector

{

public:

T_ v[s_];

T_& operator[](int i)

{

return v[i];

}

}; // class FooVector

template<typename T_, int m_, int n_, int sm_ = m_, int sn_ = n_>

class FooMatrix

{

public:

FooVector<T_, m_, sm_> v[sm_];

FooVector<T_, m_, sm_>& operator[](int i)

{

return v[i];

}

void set_identity()

{

for(int i(0); i < m_; ++i)

for(int j(0); j < n_; ++j)

v[i][j] = (i == j ? T_(1) : T_(0));

}

}; // class FooMatrix

template<typename T_, int m_, int n_>

static T_ determinant(FooMatrix<T_,m_,n_> &a)

{

return a[0][0]*a[1][1] - a[0][1]*a[1][0];

}

int main()

{

typedef double Tv[2][2];

FooMatrix<double, 2, 2> jac_mat;

double det = 0.0;

for(int i(0); i < 1; ++i) // <-- comment out to make this work

{

jac_mat.set_identity();

det = determinant(jac_mat);

}

printf("DET: %f\n", det);

return int(std::abs(det - 1.0) > 1e-3);

}

Peter_Zajac · ‎08-16-2024

Thank you for your reply. I am aware of the strict-aliasing and type punning rules in C/C++, however, I am not 100% sure that you are correct here, although I have to admit that I am not 100% sure that I am correct with my point of view either.

The cruicial point is that in constrast to the classical "int foo = 0; reinterpret_cast<float*>(&foo)" example, we are not reinterpreting the individual objects of type double, which are stored inside the FooVector/FooMatrix classes, as a different type but we are merely using different aggregates in the C++ sense to access those individual objects of type double.

In section 6.10 Lvalues and rvalues [basic.lval] the C++17 standard (that's the lastest standard I have at my disposal) states:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
(8.1) — the dynamic type of the object,
(8.2) — a cv-qualified version of the dynamic type of the object,
(8.3) — a type similar (as defined in 7.5) to the dynamic type of the object,
(8.4) — a type that is the signed or unsigned type corresponding to the dynamic type of the object,
(8.5) — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
(8.6) — an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union) ,
(8.7) — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
(8.8) — a char, unsigned char, or std: : byte type.

The interesting point here is (8.6), because according to the definition of aggregates in 11.6.1 Aggregates [dcl.init.aggr]

An aggregate is an array or a class (Clause 12) with
(1.1) — no user-provided, explicit, or inherited constructors (15.1) ,
(1.2) — no private or protected non-static data members (Clause 14) ,
(1.3) — no virtual functions (13.3) , and
(1.4) — no virtual, private, or protected base classes (13.1) .

both "FooMatrix" and the typedef "Tv" are aggregates of "double" objects, so according to my understanding of the rules, it should be at least possible to reinterpret-cast "double" pointers to both "FooMatrix" and "Tv" pointers. I admit that this is not exactly what my code example does, but modifying the reinterpret-cast from

det = determinant(*reinterpret_cast<const Tv*>(&jac_mat));

to

det = determinant(*reinterpret_cast<const Tv*>(&jac_mat.v[0].v[0]));

should result in a standard-conforming C++ program that does not violate the strict-aliasing/type-punning rules -- at least to my understanding. However, even with this modified cast, the results produced by the icpx compiler are identical to the original code.