Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7953 Discussions

Vectorization issue with ICC 12.1.3

christolb29
Beginner
262 Views
Hello,

I am using ICC to compile C function which perform basic picture analysis on pixels. I want to compare sequential and parallel OpenMP version of my program to check the improvement.
The problem is that I have different problem with vectorization of my inner loops that I do not understand.
For instance, I get this kind of report (-vec-report3 activated):
../src/OdPictureAnalysisP4A.c(648): (col. 4) remark: PARTIAL LOOP WAS VECTORIZED.
../src/OdPictureAnalysisP4A.c(674): (col. 7) remark: loop was not vectorized: unsupported loop structure.
../src/OdPictureAnalysisP4A.c(666): (col. 7) remark: loop was not vectorized: unsupported loop structure.
../src/OdPictureAnalysisP4A.c(679): (col. 54) remark: loop was not vectorized: dereference too complex.
../src/OdPictureAnalysisP4A.c(674): (col. 7) remark: LOOP WAS VECTORIZED.
../src/OdPictureAnalysisP4A.c(682): (col. 7) remark: LOOP WAS VECTORIZED.

There is a contradiction in the same report about the vectorization of one loop.
Now, same source code, activating openMP at compile time (-openmp), I get this:
../src/OdPictureAnalysisP4A.c(666): (col. 7) remark: LOOP WAS VECTORIZED.
../src/OdPictureAnalysisP4A.c(674): (col. 7) remark: LOOP WAS VECTORIZED.
../src/OdPictureAnalysisP4A.c(674): (col. 7) remark: LOOP WAS VECTORIZED.
../src/OdPictureAnalysisP4A.c(682): (col. 7) remark: LOOP WAS VECTORIZED.

No problem this time.
Secondly, I have another issue is that OpenMP #pragma influence the vectorization. Here -openmp is activated for both test.
First case, 2 nested loops without OpenMP #pragma, the inner one is vectorized.
Second case, 2 nested loops again, but with #pragma, the inner one is not vectozired anymore, the reason :
../src/OdPictureAnalysisRK_P4A.c(704): (col. 7) remark: loop was not vectorized: unsupported loop structure.

I am wondering if this issue is specific to this version of ICC? What could I try to fix this?
In all these cases I compile with -fast option. The system is Debian Linux, and the hardware is x86_64 Intel Xeon X5670.
Kind regards
0 Kudos
1 Reply
TimP
Honored Contributor III
262 Views
When I see such contradictory messages about vectorization of a loop, typically it means that the compiler has created 2 versions with run-time selection, only one of which is optimized. The opt-report should indicate whether there is versioning. This is most annoying in the case of nested loops, where my only remedy is to check whether the optimized code is executed when running my actual case. In order to trace it (e.g. under VTune), it is preferable to compile without interprocedural analysis (-fno-inline-functions -no-ipo rather than -fast).
I submitted an issue on premier.intel.com for a case where the 11.1 compiler vectorized without the multiple versioning problem, and the 12.x compilers can recover if the vectorizable code is written in CEAN (extended array notation). I've asked; it was not intentional that cases which optimized prior to introduction of CEAN now require the CEAN for satisfactory operation.
-openmp disables many of the compiler's multi-level loop optimizations in the parallel region. You must write the loop nest optimizations yourself. As you have seen, if you do so, you may avoid the compiler doing undesirable things.
0 Kudos
Reply