Solved: OpenMP or optimization changed from 13.xxx to 14.0.2.176 - Intel Community

Intel® C++ Compiler

Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

OpenMP or optimization changed from 13.xxx to 14.0.2.176

628 Views

Dear Experts in OpenMP and Intel C 14.x ,

Until version 13.x of Intel C I had the following code :

static int   fobj_offset(int m, double *a, double *fun)
{
      int     n = npivot;
      double *p = ppivot;
      int     i;
      double cen[3*n];
      double x=0.0,y=0.0,z=0.0;
      double err=0.0;

      Tool_Point_Offset[0] = a[0];
      Tool_Point_Offset[1] = a[1];
      Tool_Point_Offset[2] = a[2];

#pragma omp parallel for reduction(+ : x,y,z) num_threads (2)

      for (i=0; i<n; i++)
      {
          ComputeToolPoint(&p[8*i+0], &p[8*i+3], &cen[3*i]);
          x += cen[3*i + 0];
          y += cen[3*i + 1];
          z += cen[3*i + 2];
      }

      x = x/n;
      y = y/n;
      z = z/n;

--------------------------------

Every thing was fime. Tool_Point_Offset[] is a static (local) double array with 4 elements, only 3 are used. The values of Tool_Point_Offset are invariant for the whole for duration. The values of Tool_Point_Offset[] are used inside ComputeToolPoint(), but are not modified.

Starting with version 14.0.2.176 of the compiler I had to change into :

static int   fobj_offset(int m, double *a, double *fun)
{
      int     n = npivot;
      double *p = ppivot;
      int     i;
      double cen[3*n];
      double x=0.0,y=0.0,z=0.0;
      double err=0.0;

#pragma omp parallel for reduction(+ : x,y,z) num_threads (2)

      for (i=0; i<n; i++)
      {
          Tool_Point_Offset[0] = a[0];
          Tool_Point_Offset[1] = a[1];
          Tool_Point_Offset[2] = a[2];

          ComputeToolPoint(&p[8*i+0], &p[8*i+3], &cen[3*i]);
          x += cen[3*i + 0];
          y += cen[3*i + 1];
          z += cen[3*i + 2];
      }

x = x/n;

y = y/n;
z = z/n;

-------

I do not understand why this is needed. I started thinking that Tool_Point_Offset was modified inside ComputeToolPoint(), not only used its values, by mistake. So I printed before and after the loop, but no, it is invariant.

The new arranged code is working fine again, but for me there is not need to move the Tool_Point_Offset initialization into the loop.

Please any clarification for me ? Thanks.

1 Solution

628 Views

Hello Armando,

Thanks for uploading the test case. I can reproduce the issue on any OS, but only with the IA-32 compiler; the x64 compiler works correctly on all platforms.

The defect is due loop distribution of the OpenMP parallel 'for' at line 407:

#pragma omp parallel for reduction(+ : x,y,z) num_threads (2)
   for (i=0; i<n; i++)
   {
    ComputeToolPoint(&p[8*i+0], &p[8*i+3], &cen[3*i]);
    x += cen[3*i + 0];
    y += cen[3*i + 1];
    z += cen[3*i + 2];
   }

Besides the workaround you already noted (moving the initialisation of Tool_Point_Offset[] inside the parallel region), there are these:

1) Compile at -O1 (loop distribution only occurs at -O2 or higher)

2) Compile with an undocumented, unsupported internal switch which disables loop distribution: -mP2OPT_hlo_distribution=0

C:\ISN_Forums\U509232>icl /Qopenmp /Qstd=c99 -O3 pivot.C -mP2OPT_hlo_distribution=0
Intel(R) C++ Compiler XE for applications running on IA-32, Version 14.0.2.176 Build 20140130
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.

C:\ISN_Forums\U509232>pivot.exe
Std Deviation : 73.747 Number of points:299
Std Deviation : 96.575 Number of points:299
Std Deviation : 120.270 Number of points:299
Std Deviation : 109.656 Number of points:299

C:\ISN_Forums\U509232>

Reported to compiler engineering, tracking ID DPD200255521 I'll keep this thread updated with news.

Patrick

View solution in original post

Copy link

Link Copied

8 Replies

628 Views

Hello Armando,

>>>Starting with version 14.0.2.176 of the compiler I had to change

So what was the error symptom without the change? Was it a compilation failure? A link-time failure? A runtime error?

I suspect a runtime error -- if that is true, please add more detail (incorrect outputs, no outputs, executable crash).

Thank you,

Patrick

Copy link

628 Views

Dear Patrick,

It was a run-time error with incorrect output. I will try to create a simple test case and see if the compiler repeats these behavior.

I checked that the local static remains invariant during the whole loop execution. So I think that there is no need to assign the same values for every instance of the loop. The problem only happens if it is parallelized and only with version 14.x of the compiler.

Thanks for your attention.

Copy link

628 Views

Hello Armando,

Thanks, that would be great if you could attach a small test case.

If array Tool_Point_Offset[] is declared outside of any parallel region, then any usage within parallel regions should refer to the same shared instance. It seems as if the compiler needs a reference within the static extent of the OpenMP parallel region to get the right answer, ie, it is not working correctly if the array is only accessed in the dynamic extent (in function ComputeToolPoint()).

Further, having to put the (identical) initialization of the array in a parallel region will hurt performance and probably create false positives for Inspector XE.

Patrick

Copy link

628 Views

Hello Patrick,

I could create a shorter version, unfortunately there are 453 lines of code. I had to include data for simulation of input from a stereo-camera and some supporting functions. C source code of the test case is attached.

Is it a Windows project. Follows the compiler and linker switches :

/c /O3 /Ob2 /Oi /Ot /Oy /Qip /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /MT /Zp1 /GS- /fp:fast /Fo"Release/" /Fd"Release/vc90.pdb" /W3 /nologo /Zi /TC /Qopenmp /QxSSE3 /Qstd=c99 /Qrestrict

If you remove the OpenMP parallelization of the for loop , there is no problem. If you copy the redundant initialization of Tool_Point_Offset into the for loop, the problem vanished. If you use a previous version of Intel Composer the problem is also gone.

Thanks you.

25 KB

27 KB

Copy link

629 Views

Hello Armando,

Thanks for uploading the test case. I can reproduce the issue on any OS, but only with the IA-32 compiler; the x64 compiler works correctly on all platforms.

The defect is due loop distribution of the OpenMP parallel 'for' at line 407:

#pragma omp parallel for reduction(+ : x,y,z) num_threads (2)
   for (i=0; i<n; i++)
   {
    ComputeToolPoint(&p[8*i+0], &p[8*i+3], &cen[3*i]);
    x += cen[3*i + 0];
    y += cen[3*i + 1];
    z += cen[3*i + 2];
   }

Besides the workaround you already noted (moving the initialisation of Tool_Point_Offset[] inside the parallel region), there are these:

1) Compile at -O1 (loop distribution only occurs at -O2 or higher)

2) Compile with an undocumented, unsupported internal switch which disables loop distribution: -mP2OPT_hlo_distribution=0

C:\ISN_Forums\U509232>icl /Qopenmp /Qstd=c99 -O3 pivot.C -mP2OPT_hlo_distribution=0
Intel(R) C++ Compiler XE for applications running on IA-32, Version 14.0.2.176 Build 20140130
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.

C:\ISN_Forums\U509232>pivot.exe
Std Deviation : 73.747 Number of points:299
Std Deviation : 96.575 Number of points:299
Std Deviation : 120.270 Number of points:299
Std Deviation : 109.656 Number of points:299

C:\ISN_Forums\U509232>

Reported to compiler engineering, tracking ID DPD200255521 I'll keep this thread updated with news.

Patrick

Copy link

628 Views

Dear Patrick,

Thank you for your fast answer and clarification. I will evaluate which workaround has lower impact on the code.

Armando

Copy link

628 Views

Dear Patrick,

Do you know if this issue is solved in last update (3) ?

Thanks.

Copy link

628 Views

No, the issue is not fixed in 14.0.3. It is still under investigation by the developers. I will let you know when a compiler with the fix is available.

Patrick

Copy link

Community support is provided during standard business hours (Monday to Friday 7AM - 5PM PST). Other contact methods are available here.

Intel does not verify all solutions, including but not limited to any file transfers that may appear in this community. Accordingly, Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

For more complete information about compiler optimizations, see our Optimization Notice.