- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using an 8-th order finite difference time stepping function (for 2D acoustic wave equation) shown below.
I am observing substantial (up to 25%) performance increase from placing Intel's __assume statement inside the inner loop, compared to placing it at the beginning of the function body. (This happens regardless of number of OpenMP threads).
The code is compiled by Intel 2016-update1 compiler, Linux, with -O3 optimization option, and for AVX-capable architecture (Xeon E5-2695 v2). Compiler options I use: -std=c++11 -march=native -O3 -openmp
Is it a compiler problem?
/* Finite difference, 8-th order scheme for acoustic 2D equation. p - current pressure q - previous and next pressure c - velocity n0 x n1 - problem size p1 - stride */ void fdtd_2d( float const* const __restrict__ p, float * const __restrict__ q, float const* const __restrict__ c, int const n0, int const n1, int const p1 ) { // Stencil coefficients. static const float C[5] = { -5.6944444e+0f, 1.6000000e+0f, -2.0000000e-1f, 2.5396825e-2f, -1.7857143e-3f }; // INTEL OPTIMIZER PROBLEM? // PLACING THE FOLLOWING LINE INSIDE THE LOOP BELOW // INSTEAD OF HERE SPEEDS UP THE CODE! // __assume( p1 % 16 == 0 ); #pragma omp parallel for default(none) for ( int i1 = 0; i1 < n1; ++i1 ) { float const* const __restrict__ ps = p + i1 * p1; float * const __restrict__ qs = q + i1 * p1; float const* const __restrict__ cs = c + i1 * p1; #pragma omp simd aligned( ps, qs, cs : 64 ) for ( int i0 = 0; i0 < n0; ++i0 ) { // INTEL OPTIMIZER PROBLEM? // PLACING THE FOLLOWING LINE HERE // INSTEAD OF THE ABOVE SPEEDS UP THE CODE! __assume( p1 % 16 == 0 ); auto lap = C[0] * ps[i0]; for ( int r = 1; r <= 4; ++r ) lap += C* ( ps[i0 + r] + ps[i0 - r] + ps[i0 + r * p1] + ps[i0 - r * p1] ); qs[i0] = 2.0f * ps[i0] - qs[i0] + cs[i0] * lap; } } }
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't find a description of the "__assume" statement in the documentation for the Intel 15 or Intel 16 compiler documentation. The search feature seems to ignore the leading underscores? Searching for "__assume_aligned" brings up two results: "Function Annotations and the SIMD Directive for Vectorization" and "Programming Guidelines for Vectorization" -- neither of which mention the "_assume" statement.
The discussion at https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization says
Clauses such as __assume_aligned and __assume tell the compiler that the property holds at the particular point in the program where the clause appears.
The "const" property on p1 should enable the compiler to carry the assertion from the __assume() statement forward or backward through the whole routine, but there is no guarantee that the compiler will exploit this.
Did you try looking at the assembly code to see where the two versions differed?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page