Community
cancel
Showing results for 
Search instead for 
Did you mean: 
inteleverywhere
Beginner
146 Views

PADDW __m128i _mm_add_epi16 ( __m128i a, __m128i b) doubt

There are two versions for the same intrinsic. for example vpaddw and paddw. Is there any performance gain if vpaddw is used instead of paddw (_mm_add_epi16). Are there intrinsic for vpaddw.

VPADDW (VEX.128 encoded version)

DEST[15:0]-- SRC1[15:0]+SRC2[15:0]

DEST[31:16]-- SRC1[31:16]+SRC2[31:16]

DEST[47:32]-- SRC1[47:32]+SRC2[47:32]

DEST[63:48]-- SRC1[63:48]+SRC2[63:48]

DEST[79:64]-- SRC1[79:64]+SRC2[79:64]

DEST[95:80]-- SRC1[95:80]+SRC2[95:80]

DEST[111:96]-- SRC1[111:96]+SRC2[111:96]

DEST[127:112]-- SRC1[127:112]+SRC2[127:112]

DEST[255:128]-- 0

PADDW (128-bit Legacy SSE version)

DEST[15:0]-- DEST[15:0]+SRC[15:0]

DEST[31:16]-- DEST[31:16]+SRC[31:16]

DEST[47:32]-- DEST[47:32]+SRC[47:32]

DEST[63:48]-- DEST[63:48]+SRC[63:48]

DEST[79:64]-- DEST[79:64]+SRC[79:64]

DEST[95:80]-- DEST[95:80]+SRC[95:80]

DEST[111:96]-- DEST[111:96]+SRC[111:96]

DEST[127:112]-- DEST[127:112]+SRC[127:112]

DEST[255:128] (Unmodified)

PADDW __m128i _mm_add_epi16 ( __m128i a, __m128i b)


thanks

0 Kudos
1 Reply
146 Views

Performance just does not depend on the instruction and also in the context which it is used. You need to give a shot on your application. compiler can generate AVX instruction for same application if you compile with arch:AVX.

Reply