- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include <stdint.h> void demux1 ( const int8_t * const __restrict__ in, const int h, int8_t * const __restrict__ out) { for (int i = 0; i < h; ++i) out = in[2 * i]; }
The code above performs very poorly if compiled with ICC.
I am observing a 2.3X slowdown (!) compared to GCC 8.2
Have a look yourself on godbolt, the issue seems quite obvious,
by comparing ICC vs GCC produced assembly.
(-march=core-avx2 -Ofast -DNDEBUG)
Any clue?
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you provide us a complete test case to investigate?

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page