- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are instruction for short and byte (paddsw, paddsb) but not for integers(no paddsd)!!!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Prashanth,
You will need to implement that sequence manually.
One way is to use available 32-bit addition but check the result for underflow and overflow.
Overflow can occur only if both inputs were positive but result ended up being negative, while uderflow occurs only when both inputs are negative but result ends up being positive. These two checks produce the following algorithm in C:
int res = a + b;
int tmp = (res & ~(a | b)) < 0 ? 0x7fffffff : res;
int c = (~res & (a & b)) < 0 ? 0x80000000 : tmp;
Use of SSE4.1 (or AVX) instruction BLENDVPS (VBLENDVPS) allows ~4X speedup (and ~5X with AVX) over scalar code above, as measured on data within L1 cache on Sandy Bridge microarchitecture:
[bash]#include// requires SSE4_1 (or AVX) support for BLENDVPS (or VBLENDVPS) __m128i __inline __mm_adds_epi32( __m128i a, __m128i b ) { __m128i int_min = _mm_set1_epi32( 0x80000000 ); __m128i int_max = _mm_set1_epi32( 0x7FFFFFFF ); __m128i res = _mm_add_epi32( a, b ); __m128i sign_and = _mm_and_si128( a, b ); __m128i sign_or = _mm_or_si128( a, b ); __m128i min_sat_mask = _mm_andnot_si128( res, sign_and ); __m128i max_sat_mask = _mm_andnot_si128( sign_or, res ); __m128 res_temp = _mm_blendv_ps( _mm_castsi128_ps( res ), _mm_castsi128_ps( int_min ), _mm_castsi128_ps( min_sat_mask ) ); return _mm_castps_si128( _mm_blendv_ps( res_temp, _mm_castsi128_ps( int_max ), _mm_castsi128_ps( max_sat_mask ) ) ); } [/bash]
The following are some of functional tests results generated with the implementation above:
[bash]2147483632 + 14 = 2147483646 (7ffffff0 + e = 7ffffffe) 2147483632 + 15 = 2147483647 (7ffffff0 + f = 7fffffff) 2147483632 + 16 = 2147483647 (7ffffff0 + 10 = 7fffffff) 2147483632 + -2147483648 = -16 (7ffffff0 + 80000000 = fffffff0) -2147483648 + -2147483648 = -2147483648 (80000000 + 80000000 = 80000000) -2147483648 + 0 = -2147483648 (80000000 + 0 = 80000000) -2147483648 + 1 = -2147483647 (80000000 + 1 = 80000001) -2147483648 + -1 = -2147483648 (80000000 + ffffffff = 80000000) [/bash]
Hope this helps.
-Max
P.S. adding tag: _mm_adds_epi32 for search engines.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page