Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
68 Views

why does _mm_mulhrs_epi16() always do biased rounding to positive infinity?

Does anyone know why the pmulhrsw instruction or _mm_mulhrs_epi16(x) := RoundDown((x * y + 16384) / 32768) always rounds towards positive infinity? To me, this is terribly biased for negative numbers, because then a sequence like -0.6, 0.6, -0.6, 0.6, ... won't add up to 0 on average. Is this behavior intentional or unintentional? If it's intentional, what could be the use? Is there an easy way to make it less biased? Lucky for me, I can just change the order of my operations to get a less biased result (my function is a signed geometric mean): __m128i ChooseSign(x, sign) { return _mm_sign_epi16(x, sign) } signsDifferent = _mm_srai_epi16(_mm_xor_si128(a, b), 15) // (a ^ b) >> 15 sign = _mm_andnot_si128(signsDifferent, a) // !signsDifferent & a //result = ChooseSign(sqrt(a * b), sign) * fraction // biased result = ChooseSign(sqrt(a * b) * fraction, sign)
0 Kudos
9 Replies
Highlighted
Black Belt
68 Views

I am not sure how much your result will be different after trying to apply  /Qpc64 switch.

Another possibility could be related to implementation at hardware/microcode level.

 

0 Kudos
Highlighted
Beginner
68 Views

apply  /Qpc64 switch

pmulhrsw  isn't a floating point instruction.

0 Kudos
Highlighted
New Contributor III
68 Views

I believe, what the operation does is equivalent to mathematical rounding (i.e. to the closest nearest integer), not to the positive infinity as you say. Whether or not this behavior is "better" than truncation or another rounding mode is debatable and depends on the tasks you have.

0 Kudos
Highlighted
Beginner
68 Views

A most serious mistake. Thanks for correcting me, andysem.

I was mistaken into thinking it was biased because the formula from MSDN, https://msdn.microsoft.com/en-us/library/bb513995.aspx

was (x * y + 16384) >> 15. This looked very similar to the int(x + 0.5) method for rounding, which I know is biased for negative #s and cringe at.  But I didn't realize right shift for negative numbers isn't the same as dividing and casting to an int.

Plus. it wasn't matching my non-SIMD reference implementation, which turns out to be biased since I was calculating int(sum / 9.0f), which rounds towards zero.

I should've had more doubt before questioning the behavior of something implemented in hardware, which would've been vigorously vetted, since int(x + 0.5) would be a very expensive mistake.

_mm_mulhrs_epi16() still has some bias of always rounding x.5 towards + infinity. But that's not a big deal for my application.

 

0 Kudos
Highlighted
Beginner
68 Views

delete me

0 Kudos
Highlighted
68 Views

You are mistaken again. According to Intel Intrinsics guide

Synopsis
__m128i _mm_mulhrs_epi16 (__m128i a, __m128i b)
#include "tmmintrin.h"
Instruction: pmulhrsw
CPUID Feature Flag: SSSE3
Description
Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst.
Operation
FOR j := 0 to 7
    i := j*16
    tmp[31:0] := ((a[i+15:i] * b[i+15:i]) >> 14) + 1
    dst[i+15:i] := tmp[16:1]
ENDFOR

This is equivalent to (x*y + 16384) >> 16

Or equivalent to int(x + 0.25) method for rounding.

Jim Dempsey

0 Kudos
Highlighted
Beginner
68 Views

No, the rounding is to the nearest integer (x * y + 16384) >> 15. If you're not convinced, consider this example:

_mm_mulhrs_epi16(1, 16383) = 0          // 1 * 0.49999999

_mm_mulhrs_epi16(1, 16384) = 1          // 1 * 0.5

I've verified the behavior for all possible x & y, and the behavior is expected except when it overflows at x = -32768 and y = -32768.

0 Kudos
Highlighted
New Contributor II
68 Views

jimdempseyatthecove wrote:

FOR j := 0 to 7
    i := j*16
    tmp[31:0] := ((a[i+15:i] * b[i+15:i]) >> 14) + 1
    dst[i+15:i] := tmp[16:1]
ENDFOR

when i == 0 we have:

  dst = ((a*b >> 14) + 1) >> 1          (1)

so this is indeed a round to nearest as per the Intrinsics Guide

btw this can be simplified as follows:

  dst = a*b+16384 >> 14 >> 1          (1.1)

  dst = a*b+16384 >> 15          (1.2)

0 Kudos
Highlighted
Black Belt
68 Views

>>>pmulhrsw  isn't a floating point instruction.>>>

Rounding algorithm may involve operations on floating point values.

0 Kudos