Community
cancel
Showing results for 
Search instead for 
Did you mean: 
nemequ
New Contributor I
328 Views

ICC generates incorrect code

I'm getting incorrect results from ICC when attempting to add implementations of the NEON `vabd_s32`, `vabd_u32`, `vabdq_s32`, and `vabdq_u32` functions to SIMDe.  Here is a reduced version which shows the problem:

 

#include <stdlib.h>
#include <stdint.h>
#include <assert.h>
#include <stdio.h>
#include <inttypes.h>

/* It's not necessary to put this in a struct; it just makes it easy
 * to switch between a vector and an array for testing.
 *
 * When it is an array -O0 and -O1 work, though -O2 fails.  As a
 * vector it fails even at -O0. */
typedef struct {
  #if defined(USE_VECTOR)
    int32_t values __attribute__((__vector_size__(16)));
  #else
    int32_t values[4];
  #endif
} vec32x4;

vec32x4 abd(vec32x4 a, vec32x4 b) {
  vec32x4 r;

  // #pragma omp simd
  for(size_t i = 0 ; i < (sizeof(r.values) / sizeof(r.values[0])) ; i++) {
    int64_t tmp = ((int64_t) a.values[i]) - ((int64_t) b.values[i]);
    r.values[i] = (int32_t) (tmp < INT64_C(0) ? -tmp : tmp);
  }

  return r;
}

int main(void) {
  int res = EXIT_SUCCESS;
  vec32x4
    a = (vec32x4) { { INT32_C(   463415955), -INT32_C(  1803897040), -INT32_C(  1513176249), -INT32_C(  1092402174) } },
    b = (vec32x4) { { INT32_C(  2138828797),  INT32_C(  1510457891),  INT32_C(  1276585996),  INT32_C(  1160694450) } },
    e = (vec32x4) { { INT32_C(  1675412842), -INT32_C(   980612365), -INT32_C(  1505205051), -INT32_C(  2041870672) } };

  vec32x4 r = abd(a, b);

  for (size_t i = 0 ; i < (sizeof(r.values) / sizeof(r.values[0])) ; i++) {
    if (r.values[i] != e.values[i]) {
      fprintf(stderr, "%" PRId32 " != %" PRId32 "\n", r.values[i], e.values[i]);
      res = EXIT_FAILURE;
    }
  }

  return res;
}

 

This is with ICC 2021.1 Beta 20200602 on Linux, x86_64.

Similar code works for other types (i.e., a vector of `int16_t` instead of `int32_t`, with the operations happening on `int32_t` instead of `int64_t`).  This happens with both 64 (vabd_{s,u}32) and 128-bit (vabdq_{s,u}32) vectors, with both signed (vabd{,q}_s32) and unsigned (vabd{,q}_u32) integers.

As you can see from the comment in the code, at -O2 and higher it fails if an array is used.  If a vector is used, it fails even at -O0.  GCC and clang both provide correct results.

0 Kudos
1 Reply
Viet_H_Intel
Moderator
299 Views

I've reported this issue to our compiler Developer.

Thanks,

Viet

Reply