링크가 복사됨
3 응답
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
I'm sure we can't guess your target without hints. Code which gathers or scatters elements to and from a packed vector under SSE2 code option, possibly with the help of #pragma vector always. Setting SSE4 options would promote newer instructions for the same purpose.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
>>...So what kind of C code would usually generate shufps by the compiler?
I agree with Tim that your question is really hard to answer. So, I've looked at Intel headers with intrinsic functions and here are some details:
immintrin.h
...
/*
* Shuffle Packed Single Precision Floating-Point Values
* **** VSHUFPS ymm1, ymm2, ymm3/m256, imm8
* Moves two of the four packed single-precision floating-point values
* from each double qword of the first source operand into the low
* quadword of each double qword of the destination; moves two of the four
* packed single-precision floating-point values from each double qword of
* the second source operand into to the high quadword of each double qword
* of the destination. The selector operand determines which values are moved
* to the destination.
*/
extern __m256 __ICL_INTRINCC _mm256_shuffle_ps(__m256, __m256, const int);
...
A very generic answer could look like: A C/C++ compiler will generate the instruction if C/C++ code uses _mm256_shuffle_ps intrinsic function, or has inline assembler code for the instruction ( it is assumed that support for generation of AVX instructions is enabled ).
Also, you need to look at Intel Instruction Set Reference Manual ( Volumes 2A, 2B and 2C ) for more detailed decription of the instruction.
