- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have come across a compiler bug related to optimization in the current icpx compiler. I can reproduce the error on the newest version 2024.2.0 as well as several older versions (2023.2.1). The error appeared pretty deep in our finite element simulation software, but I have been able to boil the bug down to a somewhat simple plain standard C++ example.
Consider the following piece of code, which effectively creates a 2x2 identity matrix and computes its determinant, which should be equal to 1, and returns the exit code 0 if the determinant is corrent and 1 otherwise. There's some mild template magic as well as a reinterpret_cast going on, but this seems to be necessary, because I failed to simplify the code so that the compiler still produces incorrect results:
#include <iostream>
#include <cmath>
template<typename T_, int n_, int s_>
class FooVector
{
public:
T_ v[s_];
T_& operator[](int i)
{
return v[i];
}
}; // class FooVector
template<typename T_, int m_, int n_, int sm_ = m_, int sn_ = n_>
class FooMatrix
{
public:
FooVector<T_, m_, sm_> v[sm_];
void set_identity()
{
for(int i(0); i < m_; ++i)
for(int j(0); j < n_; ++j)
v[i][j] = (i == j ? T_(1) : T_(0));
}
}; // class FooMatrix
static double determinant(const double (&a)[2][2])
{
return a[0][0]*a[1][1] - a[0][1]*a[1][0];
}
int main()
{
typedef double Tv[2][2];
FooMatrix<double, 2, 2> jac_mat;
double det = 0.0;
for(int i(0); i < 1; ++i) // <-- comment out to make this work
{
jac_mat.set_identity();
det = determinant(*reinterpret_cast<const Tv*>(&jac_mat));
}
printf("DET: %f\n", det);
return int(std::abs(det - 1.0) > 1e-3);
}
The results produced by the compiled binary depent on the compiler flags chosen and on whether the 'for' statement in line 42 is commented out or not. I have set up this example along with four compiler configurations here: https://godbolt.org/z/qE4vd1jqP
If you don't want to use the website above, I used the following call to compile and run the code:
icpx -O3 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
The summary is:
If compiled with -O0 or -O3, then we always get the correct results.
If compiled with -O3 -qopenmp, then the exit code is 0 (correct), but the determinant printed to cout is 0, although it should be 1.
If compiled with -O1 and the for statement in line 42 is not commented out, the exit code is 1 (wrong) and the determinant is 0 (wrong).
If compiled with -O1 and the for statement in line 42 is commented out, we get correct results.
All other compilers that I have tested, including the Intel C++ classic compilers (2021.10.0), always produced the correct results.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've escalated your issue for further investigation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to confirm with you after some testing; this is what I found--
ayu1@sdp4450:~/opt$ icpx -O0 -fiopenmp icpx-heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 1.000000
0
ayu1@sdp4450:~/opt$ icpx -O3 -fiopenmp icpx-heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
0
Here I don't see that both -O3 and -O0 produced the same results
ayu1@sdp4450:~/opt$ icpx -O3 -qopenmp icpx-heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
0
ayu1@sdp4450:~/opt$ icpx -O1 -fiopenmp icpx-heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 16662665897685812963617139039402195225075356588939957430612114731476344794801546603548301662579828488938878176837977696023349297152.000000
1
Here I commented out the for loop on line 42 as you mentioned, and this is what I got:
ayu1@sdp4450:~/opt$ icpx -O1 -fiopenmp icpx-heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
1
My compiler version:
ayu1@sdp4450:~/opt$ icx -V
Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
So it looks like only -O0 gave the correct result, can you please confirm this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, firstly, here's my compiler version:
~/nobackup/test > icx -V
Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2024.0.2 Build 20231213
When I compile the original code (including the 'for' statement), I can confirm that -O1 and -O3 do not produce the same return code when compiled along with -fiopenmp, however, the determinant seems to be 0 instead of that seemingly random large number (uninitialized memory?) that you got:
~/nobackup/test > icpx -O0 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 1.000000
0
~/nobackup/test > icpx -O1 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
1
~/nobackup/test > icpx -O3 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
0
When I comment out the 'for' statement, then I presumably get correct results for -O1 -fiopenmp, but not for -O3 -fiopenmp:
~/nobackup/test > icpx -O0 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 1.000000
0
~/nobackup/test > icpx -O1 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 1.000000
0
~/nobackup/test > icpx -O3 -fiopenmp icpx_heisenbug.cpp -o icpx-heisenbug && ./icpx-heisenbug ; echo $?
DET: 0.000000
0
I'm not sure whether that matters, but I have performed theses test on a small university compute server running RedHat EL 9.4.
Completely unrelated to the bug above, but made me scratch my head:
I had to remove the "Copyright (C) XXXX-YYYY Intel Corporation. All rights reserved." from the compiler version output above, because I got the error message "The message body contains XX-YYYY, which is not permitted in this community. Please remove this content before sending your post."... The only thing that I could find when searching for the presumably offending numbers were some motorcycles...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please try the 2024.2 version and let me know what you got.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm sorry, but I do not have access to any actual machine with version 2024.2 installed; I have only tested this version on the compiler explorer website that I have referenced in my first post. Version 2024.0.2 is the newest version that I have at my disposal.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In C++, the compiler is allowed to assume that data type X can only be accessed with pointer types X*, char*, and similar types such as unsigned X*. For example:
int *ip;
float *fp = reinterpret_cast<float *>(ip);
In the above example, the compiler may assume that ip and fp point to different data, because the pointers are different types. C++ has a strong typing system with inheritance, which the compiler relies on for optimizations.
The code in the "heisenbug" program contains a similar problem:
det = determinant(*reinterpret_cast<const Tv*>(&jac_mat));
The function determinant() is declared with double[2][2] type. The compiler will assume that the determinant() function will access different data than "jac_mat", which is FooMatrix type.
The solution is to change determinant() so it can accept a FooMatrix. FooMatrix can be improved to support subscripts.
#include <iostream>
#include <cmath>
template<typename T_, int n_, int s_>
class FooVector
{
public:
T_ v[s_];
T_& operator[](int i)
{
return v[i];
}
}; // class FooVector
template<typename T_, int m_, int n_, int sm_ = m_, int sn_ = n_>
class FooMatrix
{
public:
FooVector<T_, m_, sm_> v[sm_];
FooVector<T_, m_, sm_>& operator[](int i)
{
return v[i];
}
void set_identity()
{
for(int i(0); i < m_; ++i)
for(int j(0); j < n_; ++j)
v[i][j] = (i == j ? T_(1) : T_(0));
}
}; // class FooMatrix
template<typename T_, int m_, int n_>
static T_ determinant(FooMatrix<T_,m_,n_> &a)
{
return a[0][0]*a[1][1] - a[0][1]*a[1][0];
}
int main()
{
typedef double Tv[2][2];
FooMatrix<double, 2, 2> jac_mat;
double det = 0.0;
for(int i(0); i < 1; ++i) // <-- comment out to make this work
{
jac_mat.set_identity();
det = determinant(jac_mat);
}
printf("DET: %f\n", det);
return int(std::abs(det - 1.0) > 1e-3);
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply. I am aware of the strict-aliasing and type punning rules in C/C++, however, I am not 100% sure that you are correct here, although I have to admit that I am not 100% sure that I am correct with my point of view either.
The cruicial point is that in constrast to the classical "int foo = 0; reinterpret_cast<float*>(&foo)" example, we are not reinterpreting the individual objects of type double, which are stored inside the FooVector/FooMatrix classes, as a different type but we are merely using different aggregates in the C++ sense to access those individual objects of type double.
In section 6.10 Lvalues and rvalues [basic.lval] the C++17 standard (that's the lastest standard I have at my disposal) states:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
(8.1) — the dynamic type of the object,
(8.2) — a cv-qualified version of the dynamic type of the object,
(8.3) — a type similar (as defined in 7.5) to the dynamic type of the object,
(8.4) — a type that is the signed or unsigned type corresponding to the dynamic type of the object,
(8.5) — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
(8.6) — an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union) ,
(8.7) — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
(8.8) — a char, unsigned char, or std: : byte type.
The interesting point here is (8.6), because according to the definition of aggregates in 11.6.1 Aggregates [dcl.init.aggr]
An aggregate is an array or a class (Clause 12) with
(1.1) — no user-provided, explicit, or inherited constructors (15.1) ,
(1.2) — no private or protected non-static data members (Clause 14) ,
(1.3) — no virtual functions (13.3) , and
(1.4) — no virtual, private, or protected base classes (13.1) .
both "FooMatrix" and the typedef "Tv" are aggregates of "double" objects, so according to my understanding of the rules, it should be at least possible to reinterpret-cast "double" pointers to both "FooMatrix" and "Tv" pointers. I admit that this is not exactly what my code example does, but modifying the reinterpret-cast from
det = determinant(*reinterpret_cast<const Tv*>(&jac_mat));
to
det = determinant(*reinterpret_cast<const Tv*>(&jac_mat.v[0].v[0]));
should result in a standard-conforming C++ program that does not violate the strict-aliasing/type-punning rules -- at least to my understanding. However, even with this modified cast, the results produced by the icpx compiler are identical to the original code.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page