The latest compilers often create several versions of the loop, including, according to this report, both vectorized and non-vectorized ones. opt-report should report the multi-versioning. Several months ago, I filed problem reports on some cases where the newly added non optimized versions are the ones which are actually found at run time. I suppose this is most likely to happen with nested loops, at -O3.