- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I would like to understand the behavior of this small piece of code that I have extracted from a bigger application that makes use of vectorization and simd instructions.
Please don't look at the design, it is inherited from my original code and I want to take it as it is to reproduce the anomaly, though I agree with the fact that it's senseless in this small context. I'm following the guidelines described here about the alignment.
I have the following Dummy class.
dummy.h
#ifdef __INTEL_COMPILER typedef double * __restrict__ Real_ptr __attribute__((align_value(32))); typedef const double * const __restrict__ ConstReal_ptr __attribute__((align_value(32))); #else typedef double * __restrict__ Real_ptr __attribute__((aligned(32))); typedef const double * const __restrict__ ConstReal_ptr __attribute__((aligned(32))); #endif class Dummy { public: virtual void calculate( const unsigned int n, ConstReal_ptr x, ConstReal_ptr y, Real_ptr z ) const; private: double computeSingleValue( const double x, const double y ) const; };
dummy.cpp
#include "dummy.h" #include <algorithm> static const double K = 10.0; void Dummy::calculate( const unsigned int n, ConstReal_ptr x, ConstReal_ptr y, Real_ptr z ) const { for( unsigned int i = 0; i < n; ++i) { z = computeSingleValue( x, y ); } } double Dummy::computeSingleValue( const double x, const double y ) const { return std::max(K, (x >= y) ? x : y); }
The main function tests the calculate method and couts a message in case of output different from the expected. The main.cpp is the following:
#include "dummy.h" #include <cassert> #include <cmath> #include <iostream> #include <stdlib.h> int main() { const unsigned int N = 4; Real_ptr x; assert( 0 == posix_memalign ( (void **)&x, 32, sizeof ( double ) * N ) ); x[0] = 0.0; x[1] = 10.0; x[2] = 100.0; x[3] = 1000.0; Real_ptr y; assert( 0 == posix_memalign ( (void **)&y, 32, sizeof ( double ) * N ) ); y[0] = 0.0; y[1] = 10.0; y[2] = 100.0; y[3] = 1000.0; Real_ptr z; assert( 0 == posix_memalign ( (void **)&z, 32, sizeof ( double ) * N ) ); z[0] = 0.0; z[1] = 0.0; z[2] = 0.0; z[3] = 0.0; Dummy obj; obj.calculate( N, x, y, z ); if( std::abs(10.0 - z[0])> 1.0E-18 ) { std::cout << "FAIL 0: z = " << z[0] << std::endl; }; if( std::abs(10.0 - z[1])> 1.0E-18 ) { std::cout << "FAIL 1: z = " << z[1] << std::endl; }; if( std::abs(100.0 - z[2])> 1.0E-18 ) { std::cout << "FAIL 2: z = " << z[2] << std::endl; }; if( std::abs(1000.0 - z[3])> 1.0E-18 ) { std::cout << "FAIL 3: z = " << z[3] << std::endl; }; free(x); free(y); free(z); }
Now, I'm trying to compile it with -O2 and the following compilers:
- g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
- icpc (ICC) 16.0.3 20160415
With GCC everything works fine and the result is as expected, while with the Intel compiler the values in the last two elements of the z array are wrong and the output of the program is
FAIL 2: z = 10
FAIL 3: z = 10
The thing that puzzles me, apart from the compiler dependency, is that if I do one of the following things I can get the correct output:
- decrease the optimization to -O1 or -O0
- move all the source code in a single translation unit
- replace z = computeSingleValue( x, y ); with z = std::max(K, (x >= y) ? x : y); in dummy.cpp
- add a std::cout << std::endl; in the body of computeSingleValue in dummy.cpp
- remove the __restrict__ keyword from ConstReal_ptr typedef
I'm probably doing something wrong, but I don't get it. Any help would be really appreciated.
Thanks in advance and regards,
Massi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Massimilliano
According to the engineer, this relates to bad interaction between two different C++ language features: __restrict__ and std::max
The pointers x and y are declared __restrict__, while values x and y are passed indirectly to std::max.
Unfortunately, std::max takes reference parameters. The compiler creates references that are aliases of x and y.
Because of the non-aliasing property of x and y, the compiler thinks that the references cannot point to x and y, which is definitely wrong.
We have prioritized this issue in high priority and will resolve the issue soon.
Other workaround might be not use std::max, but use a macro max function, like following:
#define max(a,b) \ ({ __typeof__ (a) _a = (a); \ __typeof__ (b) _b = (b); \ _a > _b ? _a : _b; })
Is it helpful?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Massimiliano
Thank you for raising the issue with a reproducer!
I have reproduced the issue you reported and entered it in our problem tracking system for a resolution.
Sorry for any inconvenience. I will let you know when I have an update on this issue.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your replies,
as suggested, I have tried also other memory allocators:
mkl_malloc/mkl_free
In this case the result is even different, though still wrong
FAIL 0: z = 1000
FAIL 1: z = 1000
FAIL 2: z = 1000
but again if a get rid of the __restrict__ keyword from ConstReal_ptr typedef it goes back to normal.
_mm_malloc/_mm_free
In this case the behavior is the same as with posix_memalign
FAIL 2: z = 10
FAIL 3: z = 10
and again removing the __restrict__ apparently solves the issue.
Speaking of workarounds, I have tried to instruct the compiler with the alignment in the following two ways (for simplicity I have dropped the constness of x and y, but this doesn't affect the anomaly)
void Dummy::calculate( const unsigned int n, Real_ptr x, Real_ptr y, Real_ptr z ) const { __assume_aligned(x, 32); __assume_aligned(y, 32); __assume_aligned(z, 32); for( unsigned int i = 0; i < n; ++i) { z = computeSingleValue( x, y ); } }
void Dummy::calculate( const unsigned int n, Real_ptr x, Real_ptr y, Real_ptr z ) const { x = (Real_ptr)__builtin_assume_aligned(x,32); y = (Real_ptr)__builtin_assume_aligned(y,32); z = (Real_ptr)__builtin_assume_aligned(z,32); for( unsigned int i = 0; i < n; ++i) { z = computeSingleValue( x, y ); } }
In both cases the code behaves as in my first post, so the issue is still there.
Is it possible that the problem is related to the __restrict__ keyword in the typedef rather than the alignment? Maybe I'm doing some silly mistake trying to do something which is not allowed by the language...
Thank you all again,
Massi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Minor update,
I can reproduce the issue also with an older version of the Intel compiler: icpc (ICC) 14.0.1 20131008
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And I confirm that the alignment is not the problem. I simplified the code a bit more removing the alignment:
// dummy.cpp #include <algorithm> #include <iostream> double computeSingleValue( const double x, const double y ); void calculate( const unsigned int n, const double * const __restrict__ x, const double * const __restrict__ y, double * __restrict__ z ) { for( unsigned int i = 0; i < n; ++i) { z = computeSingleValue( x, y ); } } double computeSingleValue( const double x, const double y ) { return std::max(10.0, (x >= y) ? x : y); }
and
// main.cpp #include <cmath> #include <iostream> void calculate( const unsigned int n, const double * const __restrict__ x, const double * const __restrict__ y, double * __restrict__ z ); int main() { const unsigned int N = 4; double * x = new double; x[0] = 0.0; x[1] = 10.0; x[2] = 100.0; x[3] = 1000.0; double * y = new double ; y[0] = 0.0; y[1] = 10.0; y[2] = 100.0; y[3] = 1000.0; double * z = new double ; z[0] = 0.0; z[1] = 0.0; z[2] = 0.0; z[3] = 0.0; calculate( N, x, y, z ); if( std::abs(10.0 - z[0])> 1.0E-18 ) std::cout << "FAIL 0: z = " << z[0] << std::endl; if( std::abs(10.0 - z[1])> 1.0E-18 ) std::cout << "FAIL 1: z = " << z[1] << std::endl; if( std::abs(100.0 - z[2])> 1.0E-18 ) std::cout << "FAIL 2: z = " << z[2] << std::endl; if( std::abs(1000.0 - z[3])> 1.0E-18 ) std::cout << "FAIL 3: z = " << z[3] << std::endl; delete [] x; delete [] y; delete [] z; }
and the issue is still there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Yuan,
do you have any update or suggestions on this?
Kind regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Massimilliano
According to the engineer, this relates to bad interaction between two different C++ language features: __restrict__ and std::max
The pointers x and y are declared __restrict__, while values x and y are passed indirectly to std::max.
Unfortunately, std::max takes reference parameters. The compiler creates references that are aliases of x and y.
Because of the non-aliasing property of x and y, the compiler thinks that the references cannot point to x and y, which is definitely wrong.
We have prioritized this issue in high priority and will resolve the issue soon.
Other workaround might be not use std::max, but use a macro max function, like following:
#define max(a,b) \ ({ __typeof__ (a) _a = (a); \ __typeof__ (b) _b = (b); \ _a > _b ? _a : _b; })
Is it helpful?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the reply and the proposed workaround,
According to the engineer, this relates to bad interaction between two different C++ language features: __restrict__ and std::max
The pointers x and y are declared __restrict__, while values x and y are passed indirectly to std::max.
Unfortunately, std::max takes reference parameters. The compiler creates references that are aliases of x and y.
Because of the non-aliasing property of x and y, the compiler thinks that the references cannot point to x and y, which is definitely wrong.
just for my understanding, within the same code-design as the one of this reproducer do I have (in general) to expect this issue with every function indirectly taking reference inputs from pointers declared __restrict__?
Kind regards,
Massi

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page