- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
need help, _mm_mullo_pi16() could not multiply big numbers.any suggestion to what i should do? just how do i multiply two 1-D arrays with big value(positive and negative)? x0,b,b1,s0,s1 are vectors (arrays) and _f, Adjust[], _scale are scalar integers.
// C++ codes
int x0 = s0 + s1;
if(x0 < 0)
short b
else
short b
// MMX intrinsic codes
__m64*b1 = (__m64*)b;
__m64 s0,s1,s2,s3,x0;
j=0;
__m64 r0,r1,t0,t1,t2,p0,p1;
r0 =_mm_set_pi16(Adjust[_qm],Adjust[_qm],Adjust[_qm],Adjust[_qm]);
r1 =_mm_set_pi16(_f,_f,_f,_f);
x0 =_mm_add_pi16(s0,s1);
t1 =_mm_cmpgt_pi16(_mm_set1_pi16(0),x0);
t2 = _mm_mullo_pi16((_mm_sub_pi16(_mm_setzero_si64(),x0)),r0);
t0 = _mm_mullo_pi16(x0,r0);
p0 =_mm_srai_pi16(,_scale );
p1 =_mm_srai_pi16(_mm_add_pi16(t2,r1),_scale );
b1
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
there is also a mulhi for the upper part
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
r0 ={11912,11912,11912,11912}
what i want is to multiply x0 and r0 using _m64 data type ie using _mm_mullo_ep16() and _mm_mulhi_ep16().
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
__m128i x1 = _mm_loadu_si128(&x0); // loads 16 Bytes (only 8 are used)
__m128i r1 = _mm_loadu_si128(&r0); // loads 16 Bytes (only 8 are used)
__m128i x_sse = _mm_cvtepi16_epi32(x1); // convert lower 4 16-bit values to 32-bit values with sign-extension
__m128i r_sse = _mm_cvtepi16_epi32(r1); // convert lower 4 16-bit values to 32-bit values with sign-extension
__m128i res_sse = _mm_mullo_epi32(x_sse, r_sse); // multiply 4 signed 32-bit values
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can compute the lower 16 bits and the upper 16 bits of the 32-bit results separately. Afterwards, you will need to interleave them in order to get the full 32-bit results. Something like this should work:
_m128i hi = _mm_mulhi_epi16(a, b);
_m128i lo = _mm_mullo_epi16(a,b);
_m128i r0 = _mm_unpacklo_epi16(lo, hi);
_m128i r1 = _mm_unpackhi_epi16(lo,hi);
a and b contain 8 16-bit values that you would like to multiply. r0 contains the first 4 32-bit results; r1 contains the remaining 4 32-bit results. These instructions come with the SSE2 instruction set.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page