Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

OpenMP or optimization changed from 13.xxx to 14.0.2.176

Armando_Lazaro_Alami
584 Views

Dear Experts in OpenMP and Intel C  14.x ,

Until version 13.x of Intel C I had the following code :

static  int   fobj_offset(int m, double *a, double *fun)
{
      int     n = npivot;
      double  *p = ppivot;
      int     i;
      double  cen[3*n];
      double  x=0.0,y=0.0,z=0.0;
      double  err=0.0;

      Tool_Point_Offset[0] = a[0];
      Tool_Point_Offset[1] = a[1];
      Tool_Point_Offset[2] = a[2];

#pragma omp parallel for reduction(+ : x,y,z) num_threads (2)

      for (i=0; i<n; i++)
      {
          ComputeToolPoint(&p[8*i+0],  &p[8*i+3],  &cen[3*i]);
          x += cen[3*i + 0];
          y += cen[3*i + 1];
          z += cen[3*i + 2];
      }

 

      x = x/n;
      y = y/n;
      z = z/n;

 

--------------------------------

Every thing was fime. Tool_Point_Offset[]  is a static (local) double array with 4 elements, only 3 are used. The values of Tool_Point_Offset are invariant for the whole for duration. The values of Tool_Point_Offset[] are used inside ComputeToolPoint(), but are not modified.

Starting with version 14.0.2.176 of the compiler I had to change into :

static  int   fobj_offset(int m, double *a, double *fun)
{
      int     n = npivot;
      double  *p = ppivot;
      int     i;
      double  cen[3*n];
      double  x=0.0,y=0.0,z=0.0;
      double  err=0.0;

#pragma omp parallel for reduction(+ : x,y,z) num_threads (2)

      for (i=0; i<n; i++)
      {
          Tool_Point_Offset[0] = a[0];
          Tool_Point_Offset[1] = a[1];
          Tool_Point_Offset[2] = a[2];
      
          ComputeToolPoint(&p[8*i+0],  &p[8*i+3],  &cen[3*i]);
          x += cen[3*i + 0];
          y += cen[3*i + 1];
          z += cen[3*i + 2];
      }

      x = x/n;

      y = y/n;
      z = z/n;

-------

I do not understand why this is needed. I started thinking that Tool_Point_Offset was modified inside ComputeToolPoint(), not only used its values, by mistake.  So I printed before and after the loop, but no, it is invariant.

The new arranged code is working fine again, but for me there is not need to move  the Tool_Point_Offset initialization into the loop.

Please any clarification for me ?  Thanks.

 

0 Kudos
1 Solution
pbkenned1
Employee
584 Views

Hello Armando,

Thanks for uploading the test case.  I can reproduce the issue on any OS, but only with the IA-32 compiler; the x64 compiler works correctly on all platforms.

The defect is due loop distribution of the OpenMP parallel 'for' at line 407:

#pragma omp parallel for reduction(+ : x,y,z) num_threads (2)
   for (i=0; i<n; i++)
   {
    ComputeToolPoint(&p[8*i+0],  &p[8*i+3],  &cen[3*i]);
    x += cen[3*i + 0];
    y += cen[3*i + 1];
    z += cen[3*i + 2];
   }

Besides the workaround you already noted (moving the initialisation of Tool_Point_Offset[] inside the parallel region), there are these:

1) Compile at -O1 (loop distribution only occurs at -O2 or higher)

2) Compile with an undocumented, unsupported internal switch which disables loop distribution: -mP2OPT_hlo_distribution=0

C:\ISN_Forums\U509232>icl  /Qopenmp /Qstd=c99 -O3 pivot.C -mP2OPT_hlo_distribution=0
Intel(R) C++ Compiler XE for applications running on IA-32, Version 14.0.2.176 Build 20140130
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.


C:\ISN_Forums\U509232>pivot.exe
Std Deviation :  73.747 Number of points:299
Std Deviation :  96.575 Number of points:299
Std Deviation : 120.270 Number of points:299
Std Deviation : 109.656 Number of points:299

C:\ISN_Forums\U509232>

Reported to compiler engineering, tracking ID DPD200255521  I'll keep this thread updated with news.

Patrick

View solution in original post

0 Kudos
8 Replies
pbkenned1
Employee
584 Views

Hello Armando,

>>>Starting with version 14.0.2.176 of the compiler I had to change

So what was the error symptom without the change?  Was it a compilation failure?  A link-time failure?  A runtime error? 

I suspect a runtime error -- if that is true, please add more detail (incorrect outputs, no outputs, executable crash).

Thank you,

Patrick

 

0 Kudos
Armando_Lazaro_Alami
584 Views

Dear Patrick,

It was a run-time error with incorrect output.  I will try to create a simple test case and see if the compiler repeats these behavior.

I checked that the local static remains invariant during the whole loop execution. So I think that there is no need to assign the same values for every instance of the loop.  The problem only happens if it is parallelized  and only with version 14.x of the compiler.

Thanks for your attention.

0 Kudos
pbkenned1
Employee
584 Views

Hello Armando,

Thanks, that would be great if you could attach a small test case. 

If array Tool_Point_Offset[] is declared outside of any parallel region, then any usage within parallel regions should refer to the same shared instance.  It seems as if the compiler needs a reference within the static extent of the OpenMP parallel region to get the right answer, ie, it is not working correctly if the array is only accessed in the dynamic extent (in function ComputeToolPoint()).

Further, having to put the (identical) initialization of the array in a parallel region will hurt performance and probably create false positives for Inspector XE.

 

Patrick

0 Kudos
Armando_Lazaro_Alami
584 Views

Hello  Patrick,

I could create a shorter version, unfortunately there are 453 lines of code. I had to include data for simulation of input from a stereo-camera and some supporting functions. C source code of the test case is attached.

Is it a Windows project.  Follows the compiler and linker switches :

/c /O3 /Ob2 /Oi /Ot /Oy /Qip /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /MT /Zp1 /GS- /fp:fast /Fo"Release/" /Fd"Release/vc90.pdb" /W3 /nologo /Zi /TC /Qopenmp /QxSSE3 /Qstd=c99 /Qrestrict

 

If you remove the OpenMP parallelization of the for loop , there is no problem.  If you copy the redundant initialization of Tool_Point_Offset into the for loop, the problem vanished. If you use a previous version of Intel Composer the problem is also gone.

Thanks you.

 

0 Kudos
pbkenned1
Employee
585 Views

Hello Armando,

Thanks for uploading the test case.  I can reproduce the issue on any OS, but only with the IA-32 compiler; the x64 compiler works correctly on all platforms.

The defect is due loop distribution of the OpenMP parallel 'for' at line 407:

#pragma omp parallel for reduction(+ : x,y,z) num_threads (2)
   for (i=0; i<n; i++)
   {
    ComputeToolPoint(&p[8*i+0],  &p[8*i+3],  &cen[3*i]);
    x += cen[3*i + 0];
    y += cen[3*i + 1];
    z += cen[3*i + 2];
   }

Besides the workaround you already noted (moving the initialisation of Tool_Point_Offset[] inside the parallel region), there are these:

1) Compile at -O1 (loop distribution only occurs at -O2 or higher)

2) Compile with an undocumented, unsupported internal switch which disables loop distribution: -mP2OPT_hlo_distribution=0

C:\ISN_Forums\U509232>icl  /Qopenmp /Qstd=c99 -O3 pivot.C -mP2OPT_hlo_distribution=0
Intel(R) C++ Compiler XE for applications running on IA-32, Version 14.0.2.176 Build 20140130
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.


C:\ISN_Forums\U509232>pivot.exe
Std Deviation :  73.747 Number of points:299
Std Deviation :  96.575 Number of points:299
Std Deviation : 120.270 Number of points:299
Std Deviation : 109.656 Number of points:299

C:\ISN_Forums\U509232>

Reported to compiler engineering, tracking ID DPD200255521  I'll keep this thread updated with news.

Patrick

0 Kudos
Armando_Lazaro_Alami
584 Views

Dear Patrick,

Thank you for your fast answer and clarification.  I will evaluate which workaround has lower impact on the code.

Armando

0 Kudos
Armando_Lazaro_Alami
584 Views

Dear Patrick,

Do you know if this issue is solved in last update (3) ?

Thanks.

0 Kudos
pbkenned1
Employee
584 Views

No, the issue is not fixed in 14.0.3.  It is still under investigation by the developers.  I will let you know when a compiler with the fix is available.

Patrick

0 Kudos
Reply