Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

IPP is slower than IPL on Pentium 4

alexfm
Beginner
511 Views
I amtesting IPP evaluation version downloaded from here:
File w_ipp_ia32_itanium_em64t_eval_4_1_ev05.exe
It is supposed to be optimized for Pentium 4 processor.
I made sample program which makes convolution of 1024*1024 floating-point image with 12*1 convolution matrix. There are two programs with the same code:
1) Using IPL
2) Using IPP, linked to ipl.dll from Image-Processing-IPL sample.
Test results on Pentium 3 processor:
IPL 105 ms
IPP 76 ms
Test results on Pentium4 processor:
IPL43 ms
IPP48 ms
IPL performs better on Pentium 4! What happens? Maybe I am using wrong IPP version?

Message Edited by AlexFM on 11-23-2004 05:49 AM

0 Kudos
2 Replies
Vladimir_Dudnik
Employee
511 Views
Hi Alex,
could you please provide here the example of this code, I mean, code which uses IPP, it is interesting to see what we can improve here.
Regards,
Vladimir
0 Kudos
alexfm
Beginner
511 Views
Hi Vladimir.
My code is the same in both tests since I am using IPL-to-IPP library.
#define IMAGE_WIDTH 1024
#define IMAGE_HEIGHT 1024
void CIPPTestDlg::OnButtonTest()
{
int i;
// Allocate ipl headers
int nChannels;
int nAlphaChannel;
int depth;
char colorModel[20];
char channelSeq[20];
int dataOrder;
int origin;
int align;
nChannels = 1;
nAlphaChannel = 0;
depth =IPL_DEPTH_32F;
strcpy(colorModel, "GRAY");
strcpy(channelSeq, "GRAY");
dataOrder = IPL_DATA_ORDER_PIXEL;
origin = IPL_ORIGIN_TL;
align = IPL_ALIGN_DWORD;

IplImage* pSource = iplCreateImageHeader(
nChannels,
nAlphaChannel,
depth,
colorModel,
channelSeq,
dataOrder,
origin,
align,
IMAGE_WIDTH,
IMAGE_HEIGHT,
NULL,
NULL,
NULL,
NULL);
IplImage* pDest = iplCreateImageHeader(
nChannels,
nAlphaChannel,
depth,
colorModel,
channelSeq,
dataOrder,
origin,
align,
IMAGE_WIDTH,
IMAGE_HEIGHT,
NULL,
NULL,
NULL,
NULL);
float* pfSource = new float[IMAGE_WIDTH*IMAGE_HEIGHT];
float* pfDest = new float[IMAGE_WIDTH*IMAGE_HEIGHT];
for ( i = 0; i < IMAGE_WIDTH*IMAGE_HEIGHT; i++ )
{
pfSource = (float)i + 0.5f;
}
pSource->imageData = (char*)pfSource;
pDest->imageData = (char*)pfDest;
float fTable[12];
for ( i = 0; i < 12; i++ )
{
fTable = ( i <= 5 ) ? 1.0f : -1.0f;
}
IplConvKernelFP* pKernel = iplCreateConvKernelFP(1,
12,
0,
6,
fTable);
TIME_START(t, _T("Filter"));
iplConvolve2DFP(pSource,
pDest,
&pKernel,
1,
IPL_SUM);
TIME_END(t);

iplDeleteConvKernelFP(pKernel);
iplDeallocate(pSource, IPL_IMAGE_HEADER);
iplDeallocate(pDest, IPL_IMAGE_HEADER);
delete[] pfSource;
delete[] pfDest;
}
TIME_START and TIME_END macros measure time between them using QueryPerformanceFrequency and QueryPerformanceCounter. Result is shown in debug output. There is signifigant performance boost on Pentium 3 but on Pentium 4 IPL works better.
0 Kudos
Reply