ICC performance bug?

H__W · ‎05-10-2019

#include <stdint.h>

 void demux1 (
    const int8_t * const __restrict__ in,
    const int h,
    int8_t * const __restrict__ out)
{
    for (int i = 0; i < h; ++i)
        out = in[2 * i];
}

The code above performs very poorly if compiled with ICC.
I am observing a 2.3X slowdown (!) compared to GCC 8.2

Have a look yourself on godbolt, the issue seems quite obvious,
by comparing ICC vs GCC produced assembly.
(-march=core-avx2 -Ofast -DNDEBUG)

Any clue?

Viet_H_Intel · ‎05-10-2019

Can you provide us a complete test case to investigate?