Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development Technologies
- Intel® ISA Extensions
- why does _mm_mulhrs_epi16() always do biased rounding to positive infinity?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

Does anyone know why the pmulhrsw instruction or
_mm_mulhrs_epi16(x) := RoundDown((x * y + 16384) / 32768)
always rounds towards positive infinity? To me, this is terribly biased for negative numbers, because then a sequence like -0.6, 0.6, -0.6, 0.6, ... won't add up to 0 on average.
Is this behavior intentional or unintentional? If it's intentional, what could be the use? Is there an easy way to make it less biased?
Lucky for me, I can just change the order of my operations to get a less biased result (my function is a signed geometric mean):
__m128i ChooseSign(x, sign)
{
return _mm_sign_epi16(x, sign)
}
signsDifferent = _mm_srai_epi16(_mm_xor_si128(a, b), 15) // (a ^ b) >> 15
sign = _mm_andnot_si128(signsDifferent, a) // !signsDifferent & a
//result = ChooseSign(sqrt(a * b), sign) * fraction // biased
result = ChooseSign(sqrt(a * b) * fraction, sign)

YZhan69

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-30-2015
07:25 PM

68 Views

why does _mm_mulhrs_epi16() always do biased rounding to positive infinity?

9 Replies

Highlighted
##

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-31-2015
12:56 PM

68 Views

I am not sure how much your result will be different after trying to apply /Qpc64 switch.

Another possibility could be related to implementation at hardware/microcode level.

Highlighted
##

YZhan69

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-01-2015
04:51 PM

68 Views

apply /Qpc64 switch

pmulhrsw isn't a floating point instruction.

Highlighted
##

I believe, what the operation does is equivalent to mathematical rounding (i.e. to the closest nearest integer), not to the positive infinity as you say. Whether or not this behavior is "better" than truncation or another rounding mode is debatable and depends on the tasks you have.

andysem

New Contributor III

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-04-2015
07:20 AM

68 Views

Highlighted
##

YZhan69

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-04-2015
12:18 PM

68 Views

A most serious mistake. Thanks for correcting me, andysem.

I was mistaken into thinking it was biased because the formula from MSDN, https://msdn.microsoft.com/en-us/library/bb513995.aspx

was (x * y + 16384) >> 15. This looked very similar to the int(x + 0.5) method for rounding, which I know is biased for negative #s and cringe at. But I didn't realize right shift for negative numbers isn't the same as dividing and casting to an int.

Plus. it wasn't matching my non-SIMD reference implementation, which turns out to be biased since I was calculating int(sum / 9.0f), which rounds towards zero.

I should've had more doubt before questioning the behavior of something implemented in hardware, which would've been vigorously vetted, since int(x + 0.5) would be a very expensive mistake.

_mm_mulhrs_epi16() still has some bias of always rounding x.5 towards + infinity. But that's not a big deal for my application.

Highlighted
##

YZhan69

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-04-2015
12:18 PM

68 Views

delete me

Highlighted
##

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-05-2015
03:30 PM

68 Views

You are mistaken again. According to Intel Intrinsics guide

*Synopsis
__m128i _mm_mulhrs_epi16 (__m128i a, __m128i b)
#include "tmmintrin.h"
Instruction: pmulhrsw
CPUID Feature Flag: SSSE3
Description
Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to dst.
Operation
FOR j := 0 to 7
i := j*16
tmp[31:0] := ((a[i+15:i] * b[i+15:i]) >> 14) + 1
dst[i+15:i] := tmp[16:1]
ENDFOR*

This is equivalent to (x*y + 16384) >> **16**

Or equivalent to int(x + 0.25) method for rounding.

Jim Dempsey

Highlighted
##

YZhan69

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-05-2015
05:14 PM

68 Views

No, the rounding is to the nearest integer (x * y + 16384) >> 15. If you're not convinced, consider this example:

_mm_mulhrs_epi16(1, 16383) = 0 // 1 * 0.49999999

_mm_mulhrs_epi16(1, 16384) = 1 // 1 * 0.5

I've verified the behavior for all possible x & y, and the behavior is expected except when it overflows at x = -32768 and y = -32768.

Highlighted
##

bronxzv

New Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-05-2015
11:46 PM

68 Views

jimdempseyatthecove wrote:

FOR j := 0 to 7

i := j*16

tmp[31:0] := ((a[i+15:i] * b[i+15:i]) >> 14) + 1

dst[i+15:i] := tmp[16:1]

ENDFOR

when i == 0 we have:

dst = ((a*b >> 14) + 1) >> 1 (1)

so this is indeed a round to nearest as per the Intrinsics Guide

btw this can be simplified as follows:

dst = a*b+16384 >> 14 >> 1 (1.1)

dst = a*b+16384 >> 15 (1.2)

Highlighted
##

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-06-2015
12:07 PM

68 Views

>>>pmulhrsw isn't a floating point instruction.>>>

Rounding algorithm may involve operations on floating point values.

For more complete information about compiler optimizations, see our Optimization Notice.