ICC 19.0.4.243 crashing on code using automatic parallelization on Lenovo Legion Y7000 16 Gb ram i7 8th gen, ubuntu 18.04.

souza_diniz_mendonca · ‎10-01-2019

Hi there,

I am trying to execute a program using ICC with the following command line:

icc -w -qopenmp -par-threshold0 -no-vec -fno-inline -parallel -qopt-report-phase=all -qopt-report=3

These flags parallelize the program. The original version of the program, that executes sequentially, runs without crashes. However, the optimized program causes a segmentation fault:

https://drive.google.com/file/d/1mHHzgSYfeWE8BI1Gq9jaNIfFcYV2skYJ/view?usp=sharing

The source code follows below:

#include <stdlib.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
  int i,j;
  int n=1000, m=1000;
  double b[1000][1000];

  for (i=0; i<n; i++)
    for (j=0; j<m; j++)
      b = 0.5; 

  for (i=1;i<n;i++)
    for (j=1;j<m;j++)
      b=b[i-1][j-1];

  for (i=0;i<n;i++)
    for (j=0;j<m;j++)
      printf("b[%d][%d]=%f\n", i, j, b);  
  return 0;
}

jimdempseyatthecove · ‎10-02-2019

While your code as posted should not crash, one should choose to parallelize the program using either OpenMP (-openmp) or auto-parallelization (-parallel). Do not use both.

Due to your program not having #pragma omp.... directives, does removing -openmp have any effect?

While the option -no-vec should have disabled vectorization, the option -parallel would (should) have parallelized the loops. What does an optimization report show?

Note, line 15, if loop was parallelized, is incorrect. The loops (14, 15) have loop order dependencies (b[i-1][ j-1] written to b). The optimization report should indicate that this/these loops are not auto-parallelized due to the dependencies. This said, the parallel version of these loops, while potentially producing incorrect results, should not have crashed.

Parallelizing loops 17,18 will cause array to be printed out of order (and may cause mix-mash of output). Depending on CRTL may or may not contain a critical section around the statement.

Jim Dempsey

souza_diniz_mendonca · ‎10-08-2019

Thanks for the feedback. I am executing the program using ICC with the following command line:

icc -w -par-threshold0 -no-vec -fno-inline -parallel -qopt-report-phase=all -qopt-report=3

Unfortunately, I got the same result. Please, take a look in my print screen:

I am sending also the report I got below:

Intel(R) Advisor can now assist with vectorization and show optimization
  report messages with your source code.
See "https://software.intel.com/en-us/intel-advisor-xe" for details.

Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.4.243 Build 20190416

Compiler options: -w -par-threshold0 -no-vec -fno-inline -parallel -qopt-report-phase=all -qopt-report=3 -o test.out

    Report from: Interprocedural optimizations [ipo]

INLINING OPTION VALUES:
  -inline-factor: 100
  -inline-min-size: 30
  -inline-max-size: 230
  -inline-max-total-size: 2000
  -inline-max-per-routine: 10000
  -inline-max-per-compile: 500000


Begin optimization report for: main(int, char **)

    Report from: Interprocedural optimizations [ipo]

INLINE REPORT: (main(int, char **)) [1] truedepfirstdimension.c(4,1)
  -> EXTERN: (19,7) printf(const char *__restrict__, ...)


    Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]


LOOP BEGIN at truedepfirstdimension.c(9,3)
   remark #25420: Collapsed with loop at line 10
   remark #17109: LOOP WAS AUTO-PARALLELIZED
   remark #17101: parallel loop shared={ b } private={ } firstprivate={ j i } lastprivate={ } firstlastprivate={ } reduction={ }
   remark #15540: loop was not vectorized: auto-vectorization is disabled with -no-vec flag
   remark #25438: unrolled without remainder by 2  
   remark #25015: Estimate of max trip count of loop=1000000

   LOOP BEGIN at truedepfirstdimension.c(10,5)
      remark #25421: Loop eliminated in Collapsing

   LOOP END
LOOP END

LOOP BEGIN at truedepfirstdimension.c(13,3)
   remark #17104: loop was not parallelized: existence of parallel dependence
   remark #15541: outer loop was not auto-vectorized: consider using SIMD directive

   LOOP BEGIN at truedepfirstdimension.c(14,5)
      remark #25401: memcopy(with guard) generated
      remark #17104: loop was not parallelized: existence of parallel dependence
      remark #15541: outer loop was not auto-vectorized: consider using SIMD directive

      LOOP BEGIN at truedepfirstdimension.c(14,5)
      <Multiversioned v2>
         remark #17109: LOOP WAS AUTO-PARALLELIZED
         remark #17101: parallel loop shared={ b } private={ } firstprivate={ j i } lastprivate={ } firstlastprivate={ } reduction={ }
         remark #15540: loop was not vectorized: auto-vectorization is disabled with -no-vec flag
         remark #25439: unrolled with remainder by 2  
         remark #25015: Estimate of max trip count of loop=999
      LOOP END

      LOOP BEGIN at truedepfirstdimension.c(14,5)
      <Remainder, Multiversioned v2>
         remark #25015: Estimate of max trip count of loop=999
      LOOP END
   LOOP END
LOOP END

LOOP BEGIN at truedepfirstdimension.c(17,3)
   remark #17104: loop was not parallelized: existence of parallel dependence
   remark #15541: outer loop was not auto-vectorized: consider using SIMD directive

   LOOP BEGIN at truedepfirstdimension.c(18,5)
      remark #17104: loop was not parallelized: existence of parallel dependence
      remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
   LOOP END
LOOP END

LOOP BEGIN at truedepfirstdimension.c(9,3)
   remark #15540: loop was not vectorized: auto-vectorization is disabled with -no-vec flag
   remark #25439: unrolled with remainder by 2  
   remark #25015: Estimate of max trip count of loop=1000000
LOOP END

LOOP BEGIN at truedepfirstdimension.c(9,3)
<Remainder>
   remark #25015: Estimate of max trip count of loop=1000000
LOOP END

LOOP BEGIN at truedepfirstdimension.c(14,5)
<Multiversioned v2>
   remark #15540: loop was not vectorized: auto-vectorization is disabled with -no-vec flag
   remark #25439: unrolled with remainder by 2  
   remark #25015: Estimate of max trip count of loop=999
LOOP END

LOOP BEGIN at truedepfirstdimension.c(14,5)
<Remainder, Multiversioned v2>
   remark #25015: Estimate of max trip count of loop=999
LOOP END

    Report from: Code generation optimizations [cg]

truedepfirstdimension.c(14,5):remark #34026: call to memcpy implemented as a call to optimized library version
truedepfirstdimension.c(4,1):remark #34051: REGISTER ALLOCATION : [main] truedepfirstdimension.c:4

    Hardware registers
        Reserved     :    2[ rsp rip]
        Available    :   39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15]
        Callee-save  :    6[ rbx rbp r12-r15]
        Assigned     :   15[ rax rdx rcx rbx rsi rdi r8-r15 zmm0]
        
    Routine temporaries
        Total         :     209
            Global    :      50
            Local     :     159
        Regenerable   :      77
        Spilled       :       0
        
    Routine stack
        Variables     :   8000076 bytes*
            Reads     :      11 [3.33e+06 ~ 11.0%]
            Writes    :      21 [9.01e+05 ~ 3.0%]
        Spills        :      40 bytes*
            Reads     :      15 [5.00e+00 ~ 0.0%]
            Writes    :      15 [0.00e+00 ~ 0.0%]
    
    Notes
    
        *Non-overlapping variables and spills may share stack space,
         so the total stack size might be less than this.
    

===========================================================================