Ours not to reason why the originators of STL chose to make things difficult for compilers to optimize, or why the overloading of the term "vector" to mean something other than a vectorizable object. Your case vectorizes with Intel C++ if the inner for() is preceded by #pragma ivdep.
As for g++: 39927.cpp:16: note: not vectorized: no vectype for stmt: *D.24168_54 = D.23649_1 7; scalar_type: double /usr/lib/gcc/i686-pc-cygwin/4.5.3/include/c++/bits/stl_algobase.h:762: note: not vectorized: no vectype for stmt: *__first_77 = 0.0; scalar_type: double 39927.cpp:4: note: vectorized 0 loops in function.
Which makes it somewhat more explicit than icl does that the template expansion involves multiple aliased objects.
You will see how many different verifications will be done. That is why it is slower. Also, Debug versions are always slower than Release versions. For absolute consistency you need to verify performanceof Release version ofthe test-case.
Raw-C-arrays arefaster because they don't have any C++ related overhead.