I'm getting an error with an avx2 instruction

Alittle_ · ‎07-26-2024

ENV:

ubuntu20.04;

cpu:12th Gen Intel(R) Core(TM) i9-12900K;

gcc:(Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0.

I wrote the following code：

#include <immintrin.h>
#include <iostream>

int main() {


  float test[8] = {18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 12.0, 1.0};

   // __m256 a  = _mm256_load_ps(test);
   __m256 a = _mm256_loadu_ps(test);
   std::cout << "load finish" << std::endl;
    __m256 result = _mm256_max_ps(a, a);

    float temp[8] = {0, 0, 0, 0, 0, 0, 0, 0};
    std::cout << "==" << std::endl;
    _mm256_store_ps(temp, result);
    for(int t = 0; t < 8; ++t)
    {
       std::cout << temp[t] << std::endl;
    }
    std::cout << "======" << std::endl;
  return 0;
}

At this time, it will report an error in the _mm256_store_ps, why is it, if you replace the _mm256_store_ps with_mm256_storeu_ ps, it's normal, or I add alignas(32) in front of float temp[8], and it's normal to use _mm256_store_ps, but I print the temp address, which is the same as the unaligned address.

I debugged via gdb and found that when executing vmovaps %ymm0, (%rax) throws an exception, but the contents of ymm0 have already been copied to the rbp register, why should rax be copied to ymm0 at this point, why should rax be copied to ymm0 at this time, and it will crash

Alex_Y_Intel · ‎07-31-2024

Why are you reporting an issue with GCC compiler in Intel compiler forum???

P.S. if you use Intel compiler there's no such issue.

Alittle_ · ‎08-06-2024

Okay, I got it
Maybe I'm asking the wrong question