Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

why does _mm_mulhrs_epi16() always do biased rounding to positive infinity?

YZhan69
Beginner
694 Views
Does anyone know why the pmulhrsw instruction or _mm_mulhrs_epi16(x) := RoundDown((x * y + 16384) / 32768) always rounds towards positive infinity? To me, this is terribly biased for negative numbers, because then a sequence like -0.6, 0.6, -0.6, 0.6, ... won't add up to 0 on average. Is this behavior intentional or unintentional? If it's intentional, what could be the use? Is there an easy way to make it less biased? Lucky for me, I can just change the order of my operations to get a less biased result (my function is a signed geometric mean): __m128i ChooseSign(x, sign) { return _mm_sign_epi16(x, sign) } signsDifferent = _mm_srai_epi16(_mm_xor_si128(a, b), 15) // (a ^ b) >> 15 sign = _mm_andnot_si128(signsDifferent, a) // !signsDifferent & a //result = ChooseSign(sqrt(a * b), sign) * fraction // biased result = ChooseSign(sqrt(a * b) * fraction, sign)
0 Kudos
9 Replies
Bernard
Valued Contributor I
694 Views

I am not sure how much your result will be different after trying to apply  /Qpc64 switch.

Another possibility could be related to implementation at hardware/microcode level.

 

0 Kudos
YZhan69
Beginner
694 Views

apply  /Qpc64 switch

pmulhrsw  isn't a floating point instruction.

0 Kudos
andysem
New Contributor III
694 Views

I believe, what the operation does is equivalent to mathematical rounding (i.e. to the closest nearest integer), not to the positive infinity as you say. Whether or not this behavior is "better" than truncation or another rounding mode is debatable and depends on the tasks you have.

0 Kudos
YZhan69
Beginner
694 Views

A most serious mistake. Thanks for correcting me, andysem.

I was mistaken into thinking it was biased because the formula from MSDN, https://msdn.microsoft.com/en-us/library/bb513995.aspx

was (x * y + 16384) >> 15. This looked very similar to the int(x + 0.5) method for rounding, which I know is biased for negative #s and cringe at.  But I didn't realize right shift for negative numbers isn't the same as dividing and casting to an int.

Plus. it wasn't matching my non-SIMD reference implementation, which turns out to be biased since I was calculating int(sum / 9.0f), which rounds towards zero.

I should've had more doubt before questioning the behavior of something implemented in hardware, which would've been vigorously vetted, since int(x + 0.5) would be a very expensive mistake.

_mm_mulhrs_epi16() still has some bias of always rounding x.5 towards + infinity. But that's not a big deal for my application.

 

0 Kudos
YZhan69
Beginner
694 Views

delete me

0 Kudos
jimdempseyatthecove
Honored Contributor III
694 Views

You are mistaken again. According to Intel Intrinsics guide

Synopsis
__m128i _mm_mulhrs_epi16 (__m128i a, __m128i b)
#include "tmmintrin.h"
Instruction: pmulhrsw
CPUID Feature Flag: SSSE3
Description
Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst.
Operation
FOR j := 0 to 7
    i := j*16
    tmp[31:0] := ((a[i+15:i] * b[i+15:i]) >> 14) + 1
    dst[i+15:i] := tmp[16:1]
ENDFOR

This is equivalent to (x*y + 16384) >> 16

Or equivalent to int(x + 0.25) method for rounding.

Jim Dempsey

0 Kudos
YZhan69
Beginner
694 Views

No, the rounding is to the nearest integer (x * y + 16384) >> 15. If you're not convinced, consider this example:

_mm_mulhrs_epi16(1, 16383) = 0          // 1 * 0.49999999

_mm_mulhrs_epi16(1, 16384) = 1          // 1 * 0.5

I've verified the behavior for all possible x & y, and the behavior is expected except when it overflows at x = -32768 and y = -32768.

0 Kudos
bronxzv
New Contributor II
694 Views

jimdempseyatthecove wrote:

FOR j := 0 to 7
    i := j*16
    tmp[31:0] := ((a[i+15:i] * b[i+15:i]) >> 14) + 1
    dst[i+15:i] := tmp[16:1]
ENDFOR

when i == 0 we have:

  dst = ((a*b >> 14) + 1) >> 1          (1)

so this is indeed a round to nearest as per the Intrinsics Guide

btw this can be simplified as follows:

  dst = a*b+16384 >> 14 >> 1          (1.1)

  dst = a*b+16384 >> 15          (1.2)

0 Kudos
Bernard
Valued Contributor I
694 Views

>>>pmulhrsw  isn't a floating point instruction.>>>

Rounding algorithm may involve operations on floating point values.

0 Kudos
Reply