- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
During some optimization stages, I fell on this quite surprising findings: icpc generated executable run 5 times slower with std::array than with C-stack arrays, and std::vector, while g++ and clang++ perform correctly. This is a bit disappointing since std::array is supposed to be " a container that encapsulates fixed size arrays" without computational overhead.
Here is the source :
/*
Speed test C-stack, std::vector, std::array
compile using -D STACK / VECTOR / ARRAY to select which option.
*/
#include <iostream>
#if defined(VECTOR)
#include <vector>
#elif defined(ARRAY)
#include <array>
#endif
#include <chrono>
// Timer from https:// gist.github.com/gongzhitaao/7062087
class Timer {
public:
Timer() : beg_(clock_::now()) {}
void reset() { beg_ = clock_::now(); }
double elapsed() const {
return std::chrono::duration_cast<second_>(clock_::now() - beg_).count();
}
private:
typedef std::chrono::high_resolution_clock clock_;
typedef std::chrono::duration<double, std::ratio<1>> second_;
std::chrono::time_point<clock_> beg_;
};
int main() {
Timer tmr;
constexpr auto SIZE = 100000;
constexpr auto REPETITIONS = 10000;
double result;
#ifdef STACK
#define TXT_ALLOC "on stack"
double e[SIZE];
double m[SIZE];
double s[SIZE];
double t[SIZE];
#elif defined(VECTOR)
#define TXT_ALLOC "on std::vector"
std::vector<double> e(SIZE);
std::vector<double> m(SIZE);
std::vector<double> s(SIZE);
std::vector<double> t(SIZE);
#elif defined(ARRAY)
#define TXT_ALLOC "on std::array"
std::array<double, SIZE> e;
std::array<double, SIZE> m;
std::array<double, SIZE> s;
std::array<double, SIZE> t;
#else
#error Use -D STACK / VECTOR / ARRAY
#include <STOP>
#endif
// Fill with something
for (auto i = 0; i < SIZE; ++i) {
e[i] = 1.0e-2 * static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
m[i] = 2.0e-2 * static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
s[i] = 3.0e-2 * static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
t[i] = 4.0e-2 * static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
}
// Measure timing
tmr.reset();
result = 0.0;
for (auto j = 0; j < REPETITIONS; ++j) {
for (auto i = 0; i < SIZE; ++i) {
e[i] += 0.5 * m[i] * s[i] / t[i];
}
result += e[j];
}
auto timing = tmr.elapsed() / SIZE / REPETITIONS;
std::cout << TXT_ALLOC << " : " << timing << " " << result << std::endl;
return 0;
}
Test were performed using g++ (GCC) 11.1.0, clang version 12.0.1, icpc (ICC) 2021.4.0 20210910 (2019 version also shows the same behaviour)
using -O3 on a Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
Timings (in 10^-9 s)
icpc | g++ | clang++ | |
-D STACK | 1.04 | 1.21 | 1.17 |
-D VECTOR | 1.15 | 1.17 | 1.04 |
-D ARRAY | 5.12 | 1.04 | 1.03 |
Thanks for any clarification or idea
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
I've reported this issue to our Developer. I tried with icpx and observed the same results as g++'s or clang++'s.
Can you use icpx instead?
Also, Intel Classic Compiler will enter "Legacy Product Support" mode, signaling the end of regular updates. Please refer to the article bellow for more details.
Thanks,
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
We are looking into this issue. we will get back to you soon.
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
I've reported this issue to our Developer. I tried with icpx and observed the same results as g++'s or clang++'s.
Can you use icpx instead?
Also, Intel Classic Compiler will enter "Legacy Product Support" mode, signaling the end of regular updates. Please refer to the article bellow for more details.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Viet,
Thanks for considering this issue.
I've tested icpx and it works for me too, also solving another more tortuous issue.
Since icpc will somehow tend to slowly disappear, I guess I better try getting used to icpx and its new compiling options...
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I accepted the answer. Although the learning curve for the new flags looks quite steep.
Daniel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Seems like operator[] std::array is not inlined with icpc. If you compile with -ipo, you will get the perf back.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let's close this thread. If you have any other questions/concerns, please create a new one.
Regards,
Viet

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page