IPP is slower than IPL on Pentium 4

alexfm · ‎11-23-2004

I amtesting IPP evaluation version downloaded from here:

http://www.intel.com/software/products/ipp/downloads/ippwin_ev.htm

File w_ipp_ia32_itanium_em64t_eval_4_1_ev05.exe

It is supposed to be optimized for Pentium 4 processor.

I made sample program which makes convolution of 1024*1024 floating-point image with 12*1 convolution matrix. There are two programs with the same code:

1) Using IPL

2) Using IPP, linked to ipl.dll from Image-Processing-IPL sample.

Test results on Pentium 3 processor:

IPL 105 ms

IPP 76 ms

Test results on Pentium4 processor:

IPL43 ms

IPP48 ms

IPL performs better on Pentium 4! What happens? Maybe I am using wrong IPP version?

Message Edited by AlexFM on 11-23-2004 05:49 AM

Vladimir_Dudnik · ‎11-23-2004

Hi Alex,

could you please provide here the example of this code, I mean, code which uses IPP, it is interesting to see what we can improve here.

Regards,

Vladimir

alexfm · ‎11-23-2004

Hi Vladimir.

My code is the same in both tests since I am using IPL-to-IPP library.

#define IMAGE_WIDTH 1024
#define IMAGE_HEIGHT 1024

void CIPPTestDlg::OnButtonTest()
{
int i;

// Allocate ipl headers
int nChannels;
int nAlphaChannel;
int depth;
char colorModel[20];
char channelSeq[20];
int dataOrder;
int origin;
int align;

nChannels = 1;
nAlphaChannel = 0;
depth =IPL_DEPTH_32F;
strcpy(colorModel, "GRAY");
strcpy(channelSeq, "GRAY");
dataOrder = IPL_DATA_ORDER_PIXEL;
origin = IPL_ORIGIN_TL;
align = IPL_ALIGN_DWORD;

IplImage* pSource = iplCreateImageHeader(
nChannels,
nAlphaChannel,
depth,
colorModel,
channelSeq,
dataOrder,
origin,
align,
IMAGE_WIDTH,
IMAGE_HEIGHT,
NULL,
NULL,
NULL,
NULL);

IplImage* pDest = iplCreateImageHeader(
nChannels,
nAlphaChannel,
depth,
colorModel,
channelSeq,
dataOrder,
origin,
align,
IMAGE_WIDTH,
IMAGE_HEIGHT,
NULL,
NULL,
NULL,
NULL);

float* pfSource = new float[IMAGE_WIDTH*IMAGE_HEIGHT];
float* pfDest = new float[IMAGE_WIDTH*IMAGE_HEIGHT];

for ( i = 0; i < IMAGE_WIDTH*IMAGE_HEIGHT; i++ )
{
pfSource = (float)i + 0.5f;
}

pSource->imageData = (char*)pfSource;
pDest->imageData = (char*)pfDest;

float fTable[12];

for ( i = 0; i < 12; i++ )
{
fTable = ( i <= 5 ) ? 1.0f : -1.0f;
}

IplConvKernelFP* pKernel = iplCreateConvKernelFP(1,
12,
0,
6,
fTable);

TIME_START(t, _T("Filter"));

iplConvolve2DFP(pSource,
pDest,
&pKernel,
1,
IPL_SUM);

TIME_END(t);

iplDeleteConvKernelFP(pKernel);

iplDeallocate(pSource, IPL_IMAGE_HEADER);
iplDeallocate(pDest, IPL_IMAGE_HEADER);

delete[] pfSource;
delete[] pfDest;
}

TIME_START and TIME_END macros measure time between them using QueryPerformanceFrequency and QueryPerformanceCounter. Result is shown in debug output. There is signifigant performance boost on Pentium 3 but on Pentium 4 IPL works better.