Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1057 Discussions

PADDW __m128i _mm_add_epi16 ( __m128i a, __m128i b) doubt

inteleverywhere
Beginner
204 Views

There are two versions for the same intrinsic. for example vpaddw and paddw. Is there any performance gain if vpaddw is used instead of paddw (_mm_add_epi16). Are there intrinsic for vpaddw.

VPADDW (VEX.128 encoded version)

DEST[15:0]-- SRC1[15:0]+SRC2[15:0]

DEST[31:16]-- SRC1[31:16]+SRC2[31:16]

DEST[47:32]-- SRC1[47:32]+SRC2[47:32]

DEST[63:48]-- SRC1[63:48]+SRC2[63:48]

DEST[79:64]-- SRC1[79:64]+SRC2[79:64]

DEST[95:80]-- SRC1[95:80]+SRC2[95:80]

DEST[111:96]-- SRC1[111:96]+SRC2[111:96]

DEST[127:112]-- SRC1[127:112]+SRC2[127:112]

DEST[255:128]-- 0

PADDW (128-bit Legacy SSE version)

DEST[15:0]-- DEST[15:0]+SRC[15:0]

DEST[31:16]-- DEST[31:16]+SRC[31:16]

DEST[47:32]-- DEST[47:32]+SRC[47:32]

DEST[63:48]-- DEST[63:48]+SRC[63:48]

DEST[79:64]-- DEST[79:64]+SRC[79:64]

DEST[95:80]-- DEST[95:80]+SRC[95:80]

DEST[111:96]-- DEST[111:96]+SRC[111:96]

DEST[127:112]-- DEST[127:112]+SRC[127:112]

DEST[255:128] (Unmodified)

PADDW __m128i _mm_add_epi16 ( __m128i a, __m128i b)


thanks

0 Kudos
1 Reply
Brijender_B_Intel
204 Views

Performance just does not depend on the instruction and also in the context which it is used. You need to give a shot on your application. compiler can generate AVX instruction for same application if you compile with arch:AVX.

Reply