Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

MMX intrinsics performed bad

Smart_Lubobya
Beginner
1,148 Views

i could not understand why MMX code were slower than those in c++. results for C++ was 0.000180ms, those for MMX intrinsics was 0.000280ms.any explaination? i thought parallel addition was faster than serial addition!
#include "stdafx.h"

#include

#include

#include

int _tmain(int argc, _TCHAR* argv[])

{

UINT64 startCount, endCount, diffCount, freq;

QueryPerformanceCounter((LARGE_INTEGER*)&startCount);

QueryPerformanceCounter((LARGE_INTEGER*)&endCount);

short block[4][4] ={1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4};

int j;

// c ++ codes

/*

for(j =0;j<4;j++)

{

int s0 =block[0]+block[3];

int s3 =block[0]-block[3];

int s1 =block[1]+block[2];

int s2 =block[1]-block[2];

block[0]=s0+s1;

block[2]= s0-s1;

block[1]= s2+(s3<<1);

block[3]= s3-(s2<<1);

}

*/

// MMX codes

__m64*block2 =(__m64*)block;

__m64 s0,s1,s2,s3;

j=0;

s0 =_mm_add_pi16(block2,block2[3+j]);

s3 =_mm_sub_pi16(block2,block2[3+j]);

s1 =_mm_add_pi16(block2[1+j],block2[2+j]);

s2 =_mm_sub_pi16(block2[1+j],block2[2+j]);

block2=_mm_add_pi16(s0,s1);

block2[2+j]= _mm_sub_pi16(s0,s1);

block2[1+j]= _mm_add_pi16(s2,(_mm_slli_pi16(s3,1)));

block2[3+j]= _mm_sub_pi16(s3,(_mm_slli_pi16(s2,1)));

_mm_empty();

diffCount = endCount - startCount;

QueryPerformanceFrequency((LARGE_INTEGER*)&freq);

double exeTime_in_ms = (double)diffCount * 1000.0 / freq;

printf("Executing time : %fms\\n", exeTime_in_ms);

return 0;

}

0 Kudos
1 Reply
Thomas_W_Intel
Employee
1,148 Views
It is really hard to measure such a short time precisely. I suggest that you put a loop around your code and execute it 1000 times.

Furthermore, I suggest that you have a look at the generated assembly code to verify that the compiler generates the code that you expect.
0 Kudos
Reply