- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Suppose there is a 128 bit (xmm) simdintrinisc_mm_add_epi16 (adds 8 16-bit integers). It is mentioned in the AVX programming reference that performance gain can be acheived if legacy 128 bit instructions can be processed in AVX mode (VEX.128).
2.8.2 Using AVX 128-bit Instructions Instead of Legacy SSE instructions
----------------------------------------------------------------------------------
Applications using AVX and FMA should migrate legacy 128-bit SIMD instructions to their 128-bit AVX equivalents. AVX supplies the full complement of 128-bit SIMD instructions except for AES and PCLMULQDQ.
now the syntax of the add intrinsic is PADDB __m128i _mm_add_epi8 (__m128ia,__m128ib ). But it is mentioned that the AVX instruction is VPADDB.
How can the AVX version of this integer intrinsic be used (VPADDB)? Is there a seperate AVX intrinsic which can be used for the same?
is this the way to do it
__256i data1, data2, data3;
_mm256__mm256_zeroall ();
_mm256__mm256_zeroupper ();
data3 = _mm_add_epi8 ((__m128i) data1, (__m128i) data2);
Will this perform better than than the legacy 128 bit register and intinsic usage??
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Adding to what Tim said. In addition to instrunction length, Another benefit from AVX comes from third operand. e.g. if you have to add a = b+c, the number of instructions in AVX will be less.
tranditional SSE
movaps a, b
addps a, c
AVX
addps a, b, c
When you compile your code with arch:AVX switch, compiler will know that it has to emit AVX form of instruction for a instrinsic, if instrinsic is sharing the same name and semantics for AVX/SSE. if you are working in integer space and are not using 256bit registers, you dont need to define them __m256i.
You dont need to use VZEROALL, this instruction is mainly for OS and other specific task. Appliation should use VZEROUPPER only.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page