- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Severity: High (Silent Data Corruption) Component: Loop Vectorizer / Code Generation Compiler: Intel(R) oneAPI DPC++/C++ Compiler (ICX) Flags: -O2 (Reproduces at -O2, -O3, and with -xCORE-AVX512/ -xCORE-AVX2 / -xAVX)
Summary
The ICX compiler generates logically incorrect code when vectorizing loops that initialize arrays using modulo operations with power-of-2 divisors (e.g., i % 4). This results in silent data corruption where specific elements in the sequence are written with the wrong values.
The issue persists even when the loop bound is a variable, indicating a fundamental flaw in the vectorizer's pattern generation logic, not just a constant-folding error.
Reproduction Code (Variable Size)
#include <immintrin.h> #include <stdint.h> #include <stdio.h> // Bug triggers even with variable 'size' parameter void init_arr(int16_t * a_buf, int size) { for (int i = 0; i < size; i++) { // Pattern: 3, 2, 2, 2, 3, 2, 2, 2... if ((i % 4) == 0) { a_buf[i] = 3; } else { a_buf[i] = 2; } } } int main(void) { // Test with size 17 int size = 17; int16_t * a_buf = (int16_t *)_mm_malloc (size * sizeof (int16_t), 64); init_arr(a_buf, size); // Verification int failure_count = 0; for (int i = 0; i < size; i++) { int16_t expected = ((i % 4) == 0) ? 3 : 2; if (a_buf[i] != expected) { printf("Index %d: Expected %d, Got %d\n", i, expected, a_buf[i]); failure_count++; } } if (failure_count > 0) printf("Total Failures: %d\n", failure_count); _mm_free (a_buf); return (failure_count == 0) ? 0 : 1; }
Observed Behavior
When compiled with -O2, the code fails to write the value 3 at indices 4, 8, 12, 16. It instead writes 2.
Disassembly Analysis (Proof of Logic Error)
The generated assembly for -O2 (SSE/AVX) shows that the compiler explicitly hardcodes the wrong values. For the tail case at index 16 (where size=17), the compiler emits a scalar store of 2 instead of 3.
# Disassembly of init_arr (Intel Syntax) ... # Vector stores (filling the array with incorrect patterns) movups %xmm0, (%rdi) movups %xmm0, 0x10(%rdi) # CRITICAL ERROR: # At offset 0x20 (Index 16), the compiler hardcodes immediate value 2. # Since 16 % 4 == 0, this instruction SHOULD be writing 3. movw $0x2, 0x20(%rdi) <-- Logic Error ...
Workarounds
Disable Vectorization: #pragma novector immediately before the loop.
Volatile Divisor: Making the divisor (e.g., 4) a volatile variable breaks the pattern recognition optimization.
Non-Power-of-2: Changing the modulo to % 3 or % 5 produces correct code.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The issue is known, and it will be fixed in the next release.
$ icx -V
Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2025.3.2 Build 20260112
$icx -O2 vec-bug.c && ./a.out
Index 4: Expected 3, Got 2
Index 8: Expected 3, Got 2
Index 12: Expected 3, Got 2
Index 16: Expected 3, Got 2
Total Failures: 4
$ icx vec-bug.c -O2 &&./a.out
$ icx -V
Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Upcoming Release.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page