- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi all,

I would like to understand the behavior of this small piece of code that I have extracted from a bigger application that makes use of vectorization and simd instructions.

Please don't look at the design, it is inherited from my original code and I want to take it as it is to reproduce the anomaly, though I agree with the fact that it's senseless in this small context. I'm following the guidelines described here about the alignment.

I have the following Dummy class.

**dummy.h**

#ifdef __INTEL_COMPILER typedef double * __restrict__ Real_ptr __attribute__((align_value(32))); typedef const double * const __restrict__ ConstReal_ptr __attribute__((align_value(32))); #else typedef double * __restrict__ Real_ptr __attribute__((aligned(32))); typedef const double * const __restrict__ ConstReal_ptr __attribute__((aligned(32))); #endif class Dummy { public: virtual void calculate( const unsigned int n, ConstReal_ptr x, ConstReal_ptr y, Real_ptr z ) const; private: double computeSingleValue( const double x, const double y ) const; };

**dummy.cpp**

#include "dummy.h" #include <algorithm> static const double K = 10.0; void Dummy::calculate( const unsigned int n, ConstReal_ptr x, ConstReal_ptr y, Real_ptr z ) const { for( unsigned int i = 0; i < n; ++i) { z= computeSingleValue( x, y); } } double Dummy::computeSingleValue( const double x, const double y ) const { return std::max(K, (x >= y) ? x : y); }

The main function tests the calculate method and couts a message in case of output different from the expected. The **main.cpp** is the following:

#include "dummy.h" #include <cassert> #include <cmath> #include <iostream> #include <stdlib.h> int main() { const unsigned int N = 4; Real_ptr x; assert( 0 == posix_memalign ( (void **)&x, 32, sizeof ( double ) * N ) ); x[0] = 0.0; x[1] = 10.0; x[2] = 100.0; x[3] = 1000.0; Real_ptr y; assert( 0 == posix_memalign ( (void **)&y, 32, sizeof ( double ) * N ) ); y[0] = 0.0; y[1] = 10.0; y[2] = 100.0; y[3] = 1000.0; Real_ptr z; assert( 0 == posix_memalign ( (void **)&z, 32, sizeof ( double ) * N ) ); z[0] = 0.0; z[1] = 0.0; z[2] = 0.0; z[3] = 0.0; Dummy obj; obj.calculate( N, x, y, z ); if( std::abs(10.0 - z[0])> 1.0E-18 ) { std::cout << "FAIL 0: z = " << z[0] << std::endl; }; if( std::abs(10.0 - z[1])> 1.0E-18 ) { std::cout << "FAIL 1: z = " << z[1] << std::endl; }; if( std::abs(100.0 - z[2])> 1.0E-18 ) { std::cout << "FAIL 2: z = " << z[2] << std::endl; }; if( std::abs(1000.0 - z[3])> 1.0E-18 ) { std::cout << "FAIL 3: z = " << z[3] << std::endl; }; free(x); free(y); free(z); }

Now, I'm trying to compile it with -O2 and the following compilers:

- g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
- icpc (ICC) 16.0.3 20160415

With GCC everything works fine and the result is as expected, while with the Intel compiler the values in the last two elements of the z array are wrong and the output of the program is

FAIL 2: z = 10

FAIL 3: z = 10

The thing that puzzles me, apart from the compiler dependency, is that if I do one of the following things I can get the correct output:

- decrease the optimization to -O1 or -O0
- move all the source code in a single translation unit
- replace z
*= computeSingleValue( x**, y**); with z**=*std::max(K, (x >= y) ? x : y); in dummy.cpp - add a std::cout << std::endl; in the body of computeSingleValue in dummy.cpp
- remove the __restrict__ keyword from ConstReal_ptr typedef

I'm probably doing something wrong, but I don't get it. Any help would be really appreciated.

Thanks in advance and regards,

Massi

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi, Massimilliano

According to the engineer, this relates to bad interaction between two different C++ language features: __restrict__ and std::max

`The pointers x and y are declared __restrict__, while values x`* and y** are passed indirectly to std::max.*

Unfortunately, std::max takes reference parameters. The compiler creates references that are aliases of xand y.

Because of the non-aliasing property of x and y, the compiler thinks that the references cannot point to xand y, which is definitely wrong.

We have prioritized this issue in high priority and will resolve the issue soon.

Other workaround might be not use std::max, but use a macro max function, like following:

#define max(a,b) \ ({ __typeof__ (a) _a = (a); \ __typeof__ (b) _b = (b); \ _a > _b ? _a : _b; })

Is it helpful?

Thanks.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi, Massimiliano

Thank you for raising the issue with a reproducer!

I have reproduced the issue you reported and entered it in our problem tracking system for a resolution.

Sorry for any inconvenience. I will let you know when I have an update on this issue.

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

**__builtin_assume_aligned**. Here is an example: template < class T > _RTINLINE RTvoid TestFuction( T ** _RTRESTRICT pptA ) { ... _RTALIGNED T **pptA2 = ( T ** )

**__builtin_assume_aligned**( pptA, _RTDEFAULT_ALIGNMENT ); ... } also, try to use an intrinsic functions

**_mm_malloc**/

**_mm_free**( if it is possible ) instead of posix's

**posix_memalign**/

**free**functions.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thank you for your replies,

as suggested, I have tried also other memory allocators:

**mkl_malloc**/**mkl_free**

In this case the result is even different, though still wrong

FAIL 0: z = 1000

FAIL 1: z = 1000

FAIL 2: z = 1000

but again if a get rid of the __restrict__ keyword from ConstReal_ptr typedef it goes back to normal.

**_mm_malloc**/**_mm_free**

In this case the behavior is the same as with posix_memalign

FAIL 2: z = 10

FAIL 3: z = 10

and again removing the __restrict__ apparently solves the issue.

Speaking of workarounds, I have tried to instruct the compiler with the alignment in the following two ways (for simplicity I have dropped the constness of x and y, but this doesn't affect the anomaly)

void Dummy::calculate( const unsigned int n, Real_ptr x, Real_ptr y, Real_ptr z ) const { __assume_aligned(x, 32); __assume_aligned(y, 32); __assume_aligned(z, 32); for( unsigned int i = 0; i < n; ++i) { z= computeSingleValue( x, y); } }

void Dummy::calculate( const unsigned int n, Real_ptr x, Real_ptr y, Real_ptr z ) const { x = (Real_ptr)__builtin_assume_aligned(x,32); y = (Real_ptr)__builtin_assume_aligned(y,32); z = (Real_ptr)__builtin_assume_aligned(z,32); for( unsigned int i = 0; i < n; ++i) { z= computeSingleValue( x, y); } }

In both cases the code behaves as in my first post, so the issue is still there.

Is it possible that the problem is related to the **__restrict__** keyword in the typedef rather than the alignment? Maybe I'm doing some silly mistake trying to do something which is not allowed by the language...

Thank you all again,

Massi

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

**__restrict__**keyword. >>Maybe I'm doing some silly mistake trying to do something which is not allowed by the language... This is

**absolutely**legal application of

**__restrict__**keyword. The problem is even bigger because the

**Indirect Indexing Technique**you are using is a Very Common and used in many well known algorithms, like

**Histogram algorithms**in DSP / Image Processing, and in high performance

**Pegeonhole Sorting**algorithm for integer data types.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Minor update,

I can reproduce the issue also with an older version of the Intel compiler: icpc (ICC) 14.0.1 20131008

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

And I confirm that the alignment is not the problem. I simplified the code a bit more removing the alignment:

// dummy.cpp #include <algorithm> #include <iostream> double computeSingleValue( const double x, const double y ); void calculate( const unsigned int n, const double * const __restrict__ x, const double * const __restrict__ y, double * __restrict__ z ) { for( unsigned int i = 0; i < n; ++i) { z= computeSingleValue( x, y); } } double computeSingleValue( const double x, const double y ) { return std::max(10.0, (x >= y) ? x : y); }

and

// main.cpp #include <cmath> #include <iostream> void calculate( const unsigned int n, const double * const __restrict__ x, const double * const __restrict__ y, double * __restrict__ z ); int main() { const unsigned int N = 4; double * x = new double; x[0] = 0.0; x[1] = 10.0; x[2] = 100.0; x[3] = 1000.0; double * y = new double ; y[0] = 0.0; y[1] = 10.0; y[2] = 100.0; y[3] = 1000.0; double * z = new double ; z[0] = 0.0; z[1] = 0.0; z[2] = 0.0; z[3] = 0.0; calculate( N, x, y, z ); if( std::abs(10.0 - z[0])> 1.0E-18 ) std::cout << "FAIL 0: z = " << z[0] << std::endl; if( std::abs(10.0 - z[1])> 1.0E-18 ) std::cout << "FAIL 1: z = " << z[1] << std::endl; if( std::abs(100.0 - z[2])> 1.0E-18 ) std::cout << "FAIL 2: z = " << z[2] << std::endl; if( std::abs(1000.0 - z[3])> 1.0E-18 ) std::cout << "FAIL 3: z = " << z[3] << std::endl; delete [] x; delete [] y; delete [] z; }

and the issue is still there.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

**decrease the optimization to -O1 or -O0**I think this is the best workaround. However, you do

**Not**need to do that on a global scope, by using -O0, and optimizations need to be disabled just for your processing function void

**calculate**( ... ): ... void

**calculate**( ... ); ...

**#pragma optimize ( "", off )**void

**calculate**( ... ) { ... } ... Since your reproducer demonstrates the problem in the main application then disabling optimizations on a global scope does Not look good.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Dear Yuan,

do you have any update or suggestions on this?

Kind regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi, Massimilliano

According to the engineer, this relates to bad interaction between two different C++ language features: __restrict__ and std::max

`The pointers x and y are declared __restrict__, while values x`* and y** are passed indirectly to std::max.*

Unfortunately, std::max takes reference parameters. The compiler creates references that are aliases of xand y.

Because of the non-aliasing property of x and y, the compiler thinks that the references cannot point to xand y, which is definitely wrong.

We have prioritized this issue in high priority and will resolve the issue soon.

Other workaround might be not use std::max, but use a macro max function, like following:

#define max(a,b) \ ({ __typeof__ (a) _a = (a); \ __typeof__ (b) _b = (b); \ _a > _b ? _a : _b; })

Is it helpful?

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thank you for the reply and the proposed workaround,

According to the engineer, this relates to bad interaction between two different C++ language features: __restrict__ and std::max

The pointers x and y are declared __restrict__, while values xand yare passed indirectly to std::max.

Unfortunately, std::max takes reference parameters. The compiler creates references that are aliases of xand y.

Because of the non-aliasing property of x and y, the compiler thinks that the references cannot point to xand y, which is definitely wrong.

just for my understanding, within the same code-design as the one of this reproducer do I have (in general) to expect this issue with every function indirectly taking reference inputs from pointers declared __restrict__?

Kind regards,

Massi

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page