Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7954 Discussions

Performance difference in between std::vector and arrays

velvia
Beginner
504 Views
Hi,
I am testing the performance difference in between std::vector and arrays. It seems that there is huge one with the following code :
Withicc (ICC) 11.1 20100806 on a Linux system withIntel Xeon CPU E5410 @ 2.33GHz, I get the following speed with arrays
-bash-3.2$ icc -O3 -fast main.cpp -o main
ipo: remark #11001: performing single-file optimizations
ipo: remark #11005: generating object file /tmp/ipo_iccQc5RKl.o
-bash-3.2$ time ./main
3.00499
real 0m0.399s
user 0m0.392s
sys 0m0.006s
and the following one with std::vector
-bash-3.2$ icc -O3 -fast main.cpp -o main
ipo: remark #11001: performing single-file optimizations
ipo: remark #11005: generating object file /tmp/ipo_iccBtPfeD.o
-bash-3.2$ time ./main
3.00499
real 0m4.346s
user 0m4.337s
sys 0m0.007s
Why is there such a difference ?
Francois
0 Kudos
2 Replies
TimP
Honored Contributor III
504 Views
Ours not to reason why the originators of STL chose to make things difficult for compilers to optimize, or why the overloading of the term "vector" to mean something other than a vectorizable object. Your case vectorizes with Intel C++ if the inner for() is preceded by #pragma ivdep.

As for g++:
39927.cpp:16: note: not vectorized: no vectype for stmt: *D.24168_54 = D.23649_1
7;
scalar_type: double
/usr/lib/gcc/i686-pc-cygwin/4.5.3/include/c++/bits/stl_algobase.h:762: note: not
vectorized: no vectype for stmt: *__first_77 = 0.0;
scalar_type: double
39927.cpp:4: note: vectorized 0 loops in function.

Which makes it somewhat more explicit than icl does that the template expansion involves multiple aliased objects.
0 Kudos
SergeyKostrov
Valued Contributor II
504 Views
...
array = x * (1.0 + 2.0 * x);
...

STL-vector is a C++ object and it has many operators and one of them is operator '[]'. So, when an assignment is done anothermethod of thevector classwill be called:

...
reference operator[]( size_type _Pos )
{
//...
}
...

Try to debug the code and step-into!

You will see how many different verifications will be done. That is why it is slower. Also, Debug versions are always slower than Release versions. For absolute consistency you need to verify performanceof Release version ofthe test-case.

Raw-C-arrays arefaster because they don't have any C++ related overhead.

Best regards,
Sergey
0 Kudos
Reply