- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have run into a situation that I cannot explain. I have a loop with a SIMD-enabled function and I use #pragma simd before it. This loop vectorizes if it is placed in a separate function, but does not vectorize if it is inside main(). I am using Intel C++ compiler 16.0.0.109. Please see code and vectorization reports below. Can anyone explain what is happening and if there is a way to work around this?
This is loop-in-main.cc:
__attribute__((vector)) void SimdEnabledFunction(double); int main() { int n = 10000; double a; #pragma simd for(int i = 0 ; i < n ; i++) SimdEnabledFunction(a); }
This is the optimization report for it (loop does not vectorize):
[avladim@cfx-0 ~]$ icpc -qopenmp -c -qopt-report -qopt-report-stdout loop-in-main.cc Intel(R) Advisor can now assist with vectorization and show optimization report messages with your source code. See "https://software.intel.com/en-us/intel-advisor-xe" for details. Report from: Interprocedural optimizations [ipo] INLINING OPTION VALUES: -inline-factor: 100 -inline-min-size: 30 -inline-max-size: 230 -inline-max-total-size: 2000 -inline-max-per-routine: 10000 -inline-max-per-compile: 500000 Begin optimization report for: main() Report from: Interprocedural optimizations [ipo] INLINE REPORT: (main()) [1] loop-in-main.cc(3,12) loop-in-main.cc(7): (col. 3) warning #13379: loop was not vectorized with "simd" Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at loop-in-main.cc(7,3) remark #15520: simd loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria remark #13379: loop was not vectorized with "simd" LOOP END =========================================================================== [avladim@cfx-0 ~]$
This is the other code, loop-in-func.cc, where the loop is in a separate function:
__attribute__((vector)) void SimdEnabledFunction(double); void UserFunction(int n, double* a) { #pragma simd for(int i = 0 ; i < n ; i++) SimdEnabledFunction(a); } int main() { int n = 10000; double a; UserFunction(n, a); }
This is the optimization report for it (SIMD LOOP WAS VECTORIZED):
[avladim@cfx-0 ~]$ icpc -qopenmp -c -qopt-report -qopt-report-stdout loop-in-func.cc Intel(R) Advisor can now assist with vectorization and show optimization report messages with your source code. See "https://software.intel.com/en-us/intel-advisor-xe" for details. Report from: Interprocedural optimizations [ipo] INLINING OPTION VALUES: -inline-factor: 100 -inline-min-size: 30 -inline-max-size: 230 -inline-max-total-size: 2000 -inline-max-per-routine: 10000 -inline-max-per-compile: 500000 Begin optimization report for: main() Report from: Interprocedural optimizations [ipo] INLINE REPORT: (main()) [1] loop-in-func.cc(9,12) -> INLINE: (12,3) UserFunction(int, double *) loop-in-func.cc(5): (col. 3) warning #13379: loop was not vectorized with "simd" Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at loop-in-func.cc(5,3) inlined into loop-in-func.cc(12,3) remark #15520: simd loop was not vectorized: loop with multiple exits cannot be vectorized unless it meets search loop idiom criteria remark #13379: loop was not vectorized with "simd" LOOP END =========================================================================== Begin optimization report for: UserFunction(int, double *) Report from: Interprocedural optimizations [ipo] INLINE REPORT: (UserFunction(int, double *)) [2] loop-in-func.cc(3,37) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at loop-in-func.cc(5,3) <Peeled loop for vectorization> LOOP END LOOP BEGIN at loop-in-func.cc(5,3) remark #15301: SIMD LOOP WAS VECTORIZED LOOP END LOOP BEGIN at loop-in-func.cc(5,3) <Remainder loop for vectorization> remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override LOOP END =========================================================================== [avladim@cfx-0 ~]$
Andrey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
This is an intersting issue in that it's a bug in the compiler and I'll file the issue with our developers. As a workaround you can use the following:
Add the "nothrow" clause to the code snippet as below:
__attribute__((vector, nothrow)) void SimdEnabledFunction(double);
And it should vectorize the loop. I tried with the latest 16.0 release and it works fine:
% cat loop-main.cpp
__attribute__((vector, nothrow)) void SimdEnabledFunction(double);
int main() {
int n = 10000;
double a
//#pragma simd
for(int i = 0 ; i < n ; i++)
SimdEnabledFunction(a);
}
%icpc -O3 -qopenmp -c -qopt-report -qopt-report-stdout loop-main.cpp
...
....
INLINE REPORT: (main()) [1] loop-main.cpp(3,12)
Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]
LOOP BEGIN at loop-main.cpp(7,3)
remark #15300: LOOP WAS VECTORIZED
LOOP END
===============================================
Thanks,
Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Kittur!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pleasure, Andrey. BTW, I've filed the issue with the developers and will keep you updated when the release with a fix is out, thanks.
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
OK, upon investigation on this issue it's found that this is not a bug indeed! Reason as follows:
--------------------------------------------------------------------------------------
The compiler automatically generates a try block for the program block (i.e. code inside {}) when it sees any local object or array created in that block, because those objects/arrays should be de-allocated in case an exception is thrown. That said:
- In the first case, the function main contains an array allocation and so the try block is created and if the called routine is not marked as nothrow() the loop cannot be vectorized.
- In the second case the function with the loop does not contain anything that requires the try-block creation. BTW, the first part of the report for the second compliation contains the message about that loop from inlined function that it is not vectorized in main and that part of the report was just skipped in description.
--------------------------------------------------------------------------------------
So, the workaround I suggested earlier is the correct workaround for this case for vectorizing the loop. Hope this helps...
Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kittur,
this is very interesting!
For completeness of the picture, can you also explain the result below? I am using the same code as in loop-in-main.cc, but this time instead of "int n = 10000", I have "const int n = 10000". In this case the compiler vectorizes the loop. The only change is adding the const qualifier. Why does it change the result of vectorization?
__attribute__((vector)) void SimdEnabledFunction(double); int main() { const int n = 10000; double a; #pragma simd for(int i = 0 ; i < n ; i++) SimdEnabledFunction(a); }
[avladim@cfx-0 ~]$ icpc -qopenmp -c -qopt-report -qopt-report-stdout loop-in-main-const.cc Intel(R) Advisor can now assist with vectorization and show optimization report messages with your source code. See "https://software.intel.com/en-us/intel-advisor-xe" for details. Report from: Interprocedural optimizations [ipo] INLINING OPTION VALUES: -inline-factor: 100 -inline-min-size: 30 -inline-max-size: 230 -inline-max-total-size: 2000 -inline-max-per-routine: 10000 -inline-max-per-compile: 500000 Begin optimization report for: main() Report from: Interprocedural optimizations [ipo] INLINE REPORT: (main()) [1] loop-in-main-const.cc(3,12) Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at loop-in-main-const.cc(7,3) remark #15301: SIMD LOOP WAS VECTORIZED LOOP END =========================================================================== [avladim@cfx-0 ~]$
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, that's interesting Andrey. One of the rules for vectorizing a loop is to ensure that the loop trip count is countable, that is it's known at entry to the loop at runtime and doesn't change during the duration of the loop execution and implies that the exit from the loop is not data dependent. That said, the trip count is indeed known without the const qualifier. I'll have to look into this and get back to you, thx.
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
OK, here's why it works when you add the const qualifier. When no const is specified, the allocation of array (double a
Adding const modifier to n causes that local array to be allocated statically on stack so it does not require to be freed when an exception is thrown. Thus no try-catch is created and compiler does not see a possible early exit from the loop.
Hope the above helps understand why the loop vectorizes now!
Regards,
Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrey, BTW with the latest update 1 release (which you can download from the Intel Registration Center), the vectorizer nicely outputs the message as well to that effect:
LOOP BEGIN at loop-main.cpp(7,3)
remark #15333: loop was not vectorized: exception handling for a call prevents vectorization [ loop-main.cpp(8,7) ]
LOOP END
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kittur Ganesh (Intel) wrote:
When no const is specified, the allocation of array (double a
) is done by the call to a special routineto allocate that array and must be freed when leaving the block where the array is accessible.
This is fascinating! I am wondering why the compiler needs to call a function for it. Does it mean that allocation like "double a
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
Good question. This looks like an issue and I've filed it with our developers. Reason, the call does make allocation on the stack but somehow it could be an issue with the front-end on constant propagation. I'll keep you updated on the outcome of the issue I've filed on this (constant propagation/stack) which is an interesting issue thereof. Again, appreciate for bringing this up.
_Kittur

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page