- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ENV:
ubuntu20.04;
cpu:12th Gen Intel(R) Core(TM) i9-12900K;
gcc:(Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0.
I wrote the following code:
#include <immintrin.h>
#include <iostream>
int main() {
float test[8] = {18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 12.0, 1.0};
// __m256 a = _mm256_load_ps(test);
__m256 a = _mm256_loadu_ps(test);
std::cout << "load finish" << std::endl;
__m256 result = _mm256_max_ps(a, a);
float temp[8] = {0, 0, 0, 0, 0, 0, 0, 0};
std::cout << "==" << std::endl;
_mm256_store_ps(temp, result);
for(int t = 0; t < 8; ++t)
{
std::cout << temp[t] << std::endl;
}
std::cout << "======" << std::endl;
return 0;
}
At this time, it will report an error in the _mm256_store_ps, why is it, if you replace the _mm256_store_ps with_mm256_storeu_ ps, it's normal, or I add alignas(32) in front of float temp[8], and it's normal to use _mm256_store_ps, but I print the temp address, which is the same as the unaligned address.
I debugged via gdb and found that when executing vmovaps %ymm0, (%rax) throws an exception, but the contents of ymm0 have already been copied to the rbp register, why should rax be copied to ymm0 at this point, why should rax be copied to ymm0 at this time, and it will crash
- Tags:
- avx2 smid
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why are you reporting an issue with GCC compiler in Intel compiler forum???
P.S. if you use Intel compiler there's no such issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay, I got it
Maybe I'm asking the wrong question

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page