Intel® Integrated Performance Primitives
Community support and discussions relating to developing high-performance vision, signal, security, and storage applications.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

IPP is slower than IPL on Pentium 4

alexfm
Beginner
227 Views
I amtesting IPP evaluation version downloaded from here:
File w_ipp_ia32_itanium_em64t_eval_4_1_ev05.exe
It is supposed to be optimized for Pentium 4 processor.
I made sample program which makes convolution of 1024*1024 floating-point image with 12*1 convolution matrix. There are two programs with the same code:
1) Using IPL
2) Using IPP, linked to ipl.dll from Image-Processing-IPL sample.
Test results on Pentium 3 processor:
IPL 105 ms
IPP 76 ms
Test results on Pentium4 processor:
IPL43 ms
IPP48 ms
IPL performs better on Pentium 4! What happens? Maybe I am using wrong IPP version?

Message Edited by AlexFM on 11-23-2004 05:49 AM

0 Kudos
2 Replies
Vladimir_Dudnik
Employee
227 Views
Hi Alex,
could you please provide here the example of this code, I mean, code which uses IPP, it is interesting to see what we can improve here.
Regards,
Vladimir
alexfm
Beginner
227 Views
Hi Vladimir.
My code is the same in both tests since I am using IPL-to-IPP library.
#define IMAGE_WIDTH 1024
#define IMAGE_HEIGHT 1024
void CIPPTestDlg::OnButtonTest()
{
int i;
// Allocate ipl headers
int nChannels;
int nAlphaChannel;
int depth;
char colorModel[20];
char channelSeq[20];
int dataOrder;
int origin;
int align;
nChannels = 1;
nAlphaChannel = 0;
depth =IPL_DEPTH_32F;
strcpy(colorModel, "GRAY");
strcpy(channelSeq, "GRAY");
dataOrder = IPL_DATA_ORDER_PIXEL;
origin = IPL_ORIGIN_TL;
align = IPL_ALIGN_DWORD;

IplImage* pSource = iplCreateImageHeader(
nChannels,
nAlphaChannel,
depth,
colorModel,
channelSeq,
dataOrder,
origin,
align,
IMAGE_WIDTH,
IMAGE_HEIGHT,
NULL,
NULL,
NULL,
NULL);
IplImage* pDest = iplCreateImageHeader(
nChannels,
nAlphaChannel,
depth,
colorModel,
channelSeq,
dataOrder,
origin,
align,
IMAGE_WIDTH,
IMAGE_HEIGHT,
NULL,
NULL,
NULL,
NULL);
float* pfSource = new float[IMAGE_WIDTH*IMAGE_HEIGHT];
float* pfDest = new float[IMAGE_WIDTH*IMAGE_HEIGHT];
for ( i = 0; i < IMAGE_WIDTH*IMAGE_HEIGHT; i++ )
{
pfSource = (float)i + 0.5f;
}
pSource->imageData = (char*)pfSource;
pDest->imageData = (char*)pfDest;
float fTable[12];
for ( i = 0; i < 12; i++ )
{
fTable = ( i <= 5 ) ? 1.0f : -1.0f;
}
IplConvKernelFP* pKernel = iplCreateConvKernelFP(1,
12,
0,
6,
fTable);
TIME_START(t, _T("Filter"));
iplConvolve2DFP(pSource,
pDest,
&pKernel,
1,
IPL_SUM);
TIME_END(t);

iplDeleteConvKernelFP(pKernel);
iplDeallocate(pSource, IPL_IMAGE_HEADER);
iplDeallocate(pDest, IPL_IMAGE_HEADER);
delete[] pfSource;
delete[] pfDest;
}
TIME_START and TIME_END macros measure time between them using QueryPerformanceFrequency and QueryPerformanceCounter. Result is shown in debug output. There is signifigant performance boost on Pentium 3 but on Pentium 4 IPL works better.
Reply