Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

Poor std::array performances...

Daniel_H
New Contributor I
1,263 Views

Hello all,

During some optimization stages, I fell on this quite surprising findings: icpc generated executable run 5 times slower with std::array than with C-stack arrays, and std::vector, while g++ and clang++  perform correctly. This is a bit disappointing since std::array is supposed to be " a container that encapsulates fixed size arrays" without computational overhead.

Here is the source :

/*
Speed test C-stack, std::vector, std::array
compile using -D STACK / VECTOR / ARRAY to select which option.
*/
#include <iostream>

#if defined(VECTOR)
 #include <vector>
#elif defined(ARRAY)
 #include <array>
#endif

#include <chrono>

// Timer from https:// gist.github.com/gongzhitaao/7062087
class Timer {
 public:
  Timer() : beg_(clock_::now()) {}
  void reset() { beg_ = clock_::now(); }
  double elapsed() const {
    return std::chrono::duration_cast<second_>(clock_::now() - beg_).count();
  }

 private:
  typedef std::chrono::high_resolution_clock clock_;
  typedef std::chrono::duration<double, std::ratio<1>> second_;
  std::chrono::time_point<clock_> beg_;
};

int main() {
  Timer tmr;
  constexpr auto SIZE = 100000;
  constexpr auto REPETITIONS = 10000;

  double result;
#ifdef STACK
  #define TXT_ALLOC "on stack"
  double e[SIZE];
  double  m[SIZE];
  double s[SIZE];
  double  t[SIZE];
#elif defined(VECTOR)
  #define TXT_ALLOC "on std::vector"
  std::vector<double> e(SIZE);
  std::vector<double> m(SIZE);
  std::vector<double> s(SIZE);
  std::vector<double> t(SIZE);
#elif defined(ARRAY)
  #define TXT_ALLOC "on std::array" 
  std::array<double, SIZE> e;
  std::array<double, SIZE> m;
  std::array<double, SIZE> s;
  std::array<double, SIZE> t;
#else
  #error Use -D STACK / VECTOR / ARRAY
  #include <STOP>
#endif
  
  // Fill with something
  for (auto i = 0; i < SIZE; ++i) {
    e[i] = 1.0e-2 * static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
    m[i] = 2.0e-2 * static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
    s[i] = 3.0e-2 * static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
    t[i] = 4.0e-2 * static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
  }

  // Measure timing
  tmr.reset();
  result = 0.0;
  for (auto j = 0; j < REPETITIONS; ++j) {
    for (auto i = 0; i < SIZE; ++i) {
      e[i] += 0.5 * m[i] * s[i] / t[i];
    }
    result += e[j];
  }
  auto timing = tmr.elapsed() / SIZE / REPETITIONS;
  
  std::cout << TXT_ALLOC << " : " << timing << " " << result << std::endl;

return 0;
}

 

Test were performed using  g++ (GCC) 11.1.0,  clang version 12.0.1,  icpc (ICC) 2021.4.0 20210910 (2019 version also shows the same behaviour)

using -O3 on a Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz

Timings (in 10^-9 s)

  icpc g++ clang++
-D STACK 1.04 1.21 1.17
-D VECTOR 1.15 1.17 1.04
-D ARRAY 5.12 1.04 1.03

 

Thanks for any clarification or idea 

 

Daniel

0 Kudos
1 Solution
Viet_H_Intel
Moderator
1,193 Views

Hi Daniel,


I've reported this issue to our Developer. I tried with icpx and observed the same results as g++'s or clang++'s.

Can you use icpx instead?

Also, Intel Classic Compiler will enter "Legacy Product Support" mode, signaling the end of regular updates. Please refer to the article bellow for more details.

https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html


Thanks,



View solution in original post

0 Kudos
6 Replies
VidyalathaB_Intel
Moderator
1,218 Views

Hi,


Thanks for reaching out to us.

We are looking into this issue. we will get back to you soon.


Regards,

Vidya.


0 Kudos
Viet_H_Intel
Moderator
1,194 Views

Hi Daniel,


I've reported this issue to our Developer. I tried with icpx and observed the same results as g++'s or clang++'s.

Can you use icpx instead?

Also, Intel Classic Compiler will enter "Legacy Product Support" mode, signaling the end of regular updates. Please refer to the article bellow for more details.

https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html


Thanks,



0 Kudos
Daniel_H
New Contributor I
1,181 Views

Hi Viet,

Thanks for considering this issue.
I've tested icpx and it works for me too, also solving another more tortuous issue.
Since icpc will somehow tend to slowly disappear, I guess I better try getting used to icpx and its new compiling options...

Daniel

0 Kudos
Daniel_H
New Contributor I
1,101 Views

I accepted the answer. Although the learning curve for the new flags looks quite steep.

 

Daniel

0 Kudos
Viet_H_Intel
Moderator
1,035 Views

Seems like operator[] std::array is not inlined with icpc. If you compile with -ipo, you will get the perf back.


Thanks,


0 Kudos
Viet_H_Intel
Moderator
715 Views

Let's close this thread. If you have any other questions/concerns, please create a new one.

Regards,

Viet


0 Kudos
Reply