- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I have a piece of code in the following and would like to write SSE intrinsics code for this part. But I got one problem on using __m128 and __m128i types. As you can see from the code, parameter "level" needs to be computed using __m128i type in order to do the shift operation.(there is no shift instruction for __m128, please correct me if I am wrong). But in the next line, we need to do computation between a float parameter "reQuantDownScaleFactor" and int "level", and convert the result into integer. Is there any way to do this kind of calculation using SSE intrinsics? Thank you for the answer or any comments!

------------------------------------------------------------------

static void Quantize(VideoParameters *p_Vid, Slice *currEncSlice, Macroblock *currEncMB, Slice *currDecSlice, int blkChroma, int intra, int coff[4][4], short reQuantQp)

{

int i, j;

int scaled_coeff, q_bits, level;

int qp_per;

QuantParameters *p_Quant = p_Vid->p_Quant;

LevelQuantParams **q_params_4x4;

LevelQuantParams *q_params = NULL;

float reQuantDownScaleFactor[4][4] = {

{4.00f, 3.20f, 4.00f, 3.20f},

{3.20f, 2.56f, 3.20f, 2.56f},

{4.00f, 3.20f, 4.00f, 3.20f},

{3.20f, 2.56f, 3.20f, 2.56f}

};

for (i = 0; i < 4; i++)

{

for (j = 0; j < 4; j++)

{

if (coff

!= 0) {

q_params = &q_params_4x4

; scaled_coeff = iabs (coff

) * q_params->ScaleComp; if ((i == 0) && (j == 0))

{

level = (scaled_coeff + (q_params->OffsetComp << 1) ) >> q_bits;

}

else

{

level = (scaled_coeff + q_params->OffsetComp) >> q_bits;

}

level = (int) ((2.0*level + (reQuantDownScaleFactor

)) / (2*reQuantDownScaleFactor ));

coff

= isignab(level, coff );

}

}

}

}

------------------------------------------------------------------

Best,

Ivan

Link Copied

1 Reply

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

_mm_cvtps_epi32(__m128 _A);

also, i will suggest move case i==0 and j==0 out of the loop when you vectorize the code. then you have non branched code in loop, which will be fast.

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page