Intrinsic guide 2.6 error in documentation

gilgil · ‎02-09-2012

In the documentation the intrinsic _mm_mulhrs_epi16 the shift right should be 15 and not 14.

Patrick_K_Intel · ‎02-13-2012

14 bits is correct. See the Instruction Set Reference in the Software Developer's Manual:http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

PMULHRSW (with 128-bit operand)

temp0[31:0] = INT32 ((DEST[15:0] * SRC[15:0]) >>14) + 1;
temp1[31:0] = INT32 ((DEST[31:16] * SRC[31:16]) >>14) + 1;
temp2[31:0] = INT32 ((DEST[47:32] * SRC[47:32]) >>14) + 1;
temp3[31:0] = INT32 ((DEST[63:48] * SRC[63:48]) >>14) + 1;
temp4[31:0] = INT32 ((DEST[79:64] * SRC[79:64]) >>14) + 1;
temp5[31:0] = INT32 ((DEST[95:80] * SRC[95:80]) >>14) + 1;
temp6[31:0] = INT32 ((DEST[111:96] * SRC[111:96]) >>14) + 1;
temp7[31:0] = INT32 ((DEST[127:112] * SRC[127:112) >>14) + 1;
DEST[15:0] = temp0[16:1];
DEST[31:16] = temp1[16:1];
DEST[47:32] = temp2[16:1];
DEST[63:48] = temp3[16:1];
DEST[79:64] = temp4[16:1];
DEST[95:80] = temp5[16:1];
DEST[111:96] = temp6[16:1];
DEST[127:112] = temp7[16:1];

gilgil · ‎05-10-2012

I still do not understand...

I try the next piece of code
float factor = 1.f;
__m128i vFactor = _mm_set1_epi16(factor*(1<<14)); // Using fixed point..

__m128i inputVec = _mm_set_epi16(32,54,124,75,35,235,244,36);

__m128i resultVec = _mm_mulhrs_epi16(inputVec,vFactor);

By your explanation I should get resultVec = inputVec but the result elements are actually half the original values..

sirrida · ‎05-10-2012

If you carefully read the documentation you will notice an additional hidden shift by 1.
The temp*[16:1] can be read as (temp*[31:0]>>1)[15:0].

It might make sense to make the documentation more evident about this.

gilgil · ‎05-10-2012

I agree the documentation for this function is not the best one.