Solved: Compiler error when optimising loop

ChristopherBell · ‎02-24-2025

I'm using the 2023 OneAPI C++ compiler to rebuild some existing software that we have previously built with the 2019 Intel compiler. I am seeing an error in a loop which "expands" an array of 4 byte integers to become 8 byte integers in the same storage space.

Qualitatively it works like this:

Declare an array long enough to hold N 8 byte (long long) integers
Read N 4 byte (int) integers into the bottom half of this
Working from the end of the array backwards copy each 4 byte integer into the appropriate 8 byte slot by array index.

The coding that matters could be written:

long long *buff_8; // Dynamically allocated space to hold N long longs

int *buff_4 = (int *)buff_8; // Interpret this space as 4 byte integers

... populate buff_4[] with an array of N 4 byte integer values

for(i=N-1; i>=0; i--) buff_8[i] = buff_4[i]; // Copy the numbers to long long format

I have attached two very simple files which show the problem:

main.c sets up, populates and tests the storage
sub.c performs the I4 => I8 copy

To reproduce this just create a console C project of these two files, nothing fancy required, use a Release build to turn on optimisation (I used /O2) and turn off interprocedural optimisation.

Working on Windows (in VS2022) if I perform the copy in a separate function in a separate file and turn interprocedural optimisation to multiifile (/Qipo) this works fine, if I turn it off it fails and I get zeros. Interestingly if I use Intel 2019 compilers the opposite is true: it fails when using /Qipo but works without.

I also tried on Linux where I got different behaviour: the Intel 2019 compiler failed when I wrote this as a single function, but worked if I put a #pragma novector before the copy loop, the OneAPI compiler has - so far - worked OK.

We see this problem in the 2024 compiler as well, haven't tried 2025.

We can solve the problem by isolating that function and using #pragma optimize ("", off) to stop it being optimised, so it is not a show-stopper. But I think it is quite a serious compiler bug.

ChristopherBell · ‎02-27-2025

Thanks a bunch for that. I have never heard about strict aliasing before, I just assumed that memory contents were mine to interpret as I pleased.

I have loads of mixed float and int arrays as well, which I see contravene that rule. I'm going to need that compiler flag!

View solution in original post

yzh_intel · ‎02-27-2025

Hi, icpx defaults to O2 optimization level, and it assumes code doesn't violate strict aliasing rule. In your code, buff_8 and buff_4 point to the same address, which violates this rule. This issue can be avoided with option "-fno-strict-aliasing".

ChristopherBell · ‎02-27-2025

Thanks a bunch for that. I have never heard about strict aliasing before, I just assumed that memory contents were mine to interpret as I pleased.

I have loads of mixed float and int arrays as well, which I see contravene that rule. I'm going to need that compiler flag!