- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I am trying to execute a program using ICC with the following command line:
icc -w -qopenmp -par-threshold0 -no-vec -fno-inline -parallel -qopt-report-phase=all -qopt-report=3
These flags parallelize the program. The original version of the program, that executes sequentially, runs without crashes. However, the optimized program causes a segmentation fault:
https://drive.google.com/file/d/1mHHzgSYfeWE8BI1Gq9jaNIfFcYV2skYJ/view?usp=sharing
The source code follows below:
#include <stdlib.h> #include <stdio.h> int main(int argc, char* argv[]) { int i,j; int n=1000, m=1000; double b[1000][1000]; for (i=0; i<n; i++) for (j=0; j<m; j++) b= 0.5; for (i=1;i<n;i++) for (j=1;j<m;j++) b =b[i-1][j-1]; for (i=0;i<n;i++) for (j=0;j<m;j++) printf("b[%d][%d]=%f\n", i, j, b ); return 0; }
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While your code as posted should not crash, one should choose to parallelize the program using either OpenMP (-openmp) or auto-parallelization (-parallel). Do not use both.
Due to your program not having #pragma omp.... directives, does removing -openmp have any effect?
While the option -no-vec should have disabled vectorization, the option -parallel would (should) have parallelized the loops. What does an optimization report show?
Note, line 15, if loop was parallelized, is incorrect. The loops (14, 15) have loop order dependencies (b[i-1][ j-1] written to b
Parallelizing loops 17,18 will cause array to be printed out of order (and may cause mix-mash of output). Depending on CRTL may or may not contain a critical section around the statement.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the feedback. I am executing the program using ICC with the following command line:
icc -w -par-threshold0 -no-vec -fno-inline -parallel -qopt-report-phase=all -qopt-report=3
Unfortunately, I got the same result. Please, take a look in my print screen:
I am sending also the report I got below:
Intel(R) Advisor can now assist with vectorization and show optimization report messages with your source code. See "https://software.intel.com/en-us/intel-advisor-xe" for details. Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.4.243 Build 20190416 Compiler options: -w -par-threshold0 -no-vec -fno-inline -parallel -qopt-report-phase=all -qopt-report=3 -o test.out Report from: Interprocedural optimizations [ipo] INLINING OPTION VALUES: -inline-factor: 100 -inline-min-size: 30 -inline-max-size: 230 -inline-max-total-size: 2000 -inline-max-per-routine: 10000 -inline-max-per-compile: 500000 Begin optimization report for: main(int, char **) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (main(int, char **)) [1] truedepfirstdimension.c(4,1) -> EXTERN: (19,7) printf(const char *__restrict__, ...) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at truedepfirstdimension.c(9,3) remark #25420: Collapsed with loop at line 10 remark #17109: LOOP WAS AUTO-PARALLELIZED remark #17101: parallel loop shared={ b } private={ } firstprivate={ j i } lastprivate={ } firstlastprivate={ } reduction={ } remark #15540: loop was not vectorized: auto-vectorization is disabled with -no-vec flag remark #25438: unrolled without remainder by 2 remark #25015: Estimate of max trip count of loop=1000000 LOOP BEGIN at truedepfirstdimension.c(10,5) remark #25421: Loop eliminated in Collapsing LOOP END LOOP END LOOP BEGIN at truedepfirstdimension.c(13,3) remark #17104: loop was not parallelized: existence of parallel dependence remark #15541: outer loop was not auto-vectorized: consider using SIMD directive LOOP BEGIN at truedepfirstdimension.c(14,5) remark #25401: memcopy(with guard) generated remark #17104: loop was not parallelized: existence of parallel dependence remark #15541: outer loop was not auto-vectorized: consider using SIMD directive LOOP BEGIN at truedepfirstdimension.c(14,5) <Multiversioned v2> remark #17109: LOOP WAS AUTO-PARALLELIZED remark #17101: parallel loop shared={ b } private={ } firstprivate={ j i } lastprivate={ } firstlastprivate={ } reduction={ } remark #15540: loop was not vectorized: auto-vectorization is disabled with -no-vec flag remark #25439: unrolled with remainder by 2 remark #25015: Estimate of max trip count of loop=999 LOOP END LOOP BEGIN at truedepfirstdimension.c(14,5) <Remainder, Multiversioned v2> remark #25015: Estimate of max trip count of loop=999 LOOP END LOOP END LOOP END LOOP BEGIN at truedepfirstdimension.c(17,3) remark #17104: loop was not parallelized: existence of parallel dependence remark #15541: outer loop was not auto-vectorized: consider using SIMD directive LOOP BEGIN at truedepfirstdimension.c(18,5) remark #17104: loop was not parallelized: existence of parallel dependence remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details LOOP END LOOP END LOOP BEGIN at truedepfirstdimension.c(9,3) remark #15540: loop was not vectorized: auto-vectorization is disabled with -no-vec flag remark #25439: unrolled with remainder by 2 remark #25015: Estimate of max trip count of loop=1000000 LOOP END LOOP BEGIN at truedepfirstdimension.c(9,3) <Remainder> remark #25015: Estimate of max trip count of loop=1000000 LOOP END LOOP BEGIN at truedepfirstdimension.c(14,5) <Multiversioned v2> remark #15540: loop was not vectorized: auto-vectorization is disabled with -no-vec flag remark #25439: unrolled with remainder by 2 remark #25015: Estimate of max trip count of loop=999 LOOP END LOOP BEGIN at truedepfirstdimension.c(14,5) <Remainder, Multiversioned v2> remark #25015: Estimate of max trip count of loop=999 LOOP END Report from: Code generation optimizations [cg] truedepfirstdimension.c(14,5):remark #34026: call to memcpy implemented as a call to optimized library version truedepfirstdimension.c(4,1):remark #34051: REGISTER ALLOCATION : [main] truedepfirstdimension.c:4 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 15[ rax rdx rcx rbx rsi rdi r8-r15 zmm0] Routine temporaries Total : 209 Global : 50 Local : 159 Regenerable : 77 Spilled : 0 Routine stack Variables : 8000076 bytes* Reads : 11 [3.33e+06 ~ 11.0%] Writes : 21 [9.01e+05 ~ 3.0%] Spills : 40 bytes* Reads : 15 [5.00e+00 ~ 0.0%] Writes : 15 [0.00e+00 ~ 0.0%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. ===========================================================================
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page