- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi..
I am executing a C++ code implemented using IPP,for repeat FIR filtering 640*480 times for 512 data.
While executing code on 2way Xeon Quad CPU ( = 8 core), the CPU usage is just 12%.
Just one cpu works.
What is my mistake ?
My system is like this.
Xeon E5320 Quad - 2 way : 8 Core
Windows Server 2003 R2 x86
IPP : 5.3.3
IPP Dynamic Link Library
Visual Studio 6.0 + Intel Compiler 10.1
int nTotalPixel = m_ImageSizeX * m_ImageSizeY;
int i;
int k;
int len = m_nImage;
IppsFIRState_32f* pState;
IppStatus st;
Ipp64f* taps = ippsMalloc_64f(tapslen*sizeof(Ipp64f));
Ipp32f* taps_32f = ippsMalloc_32f(tapslen*sizeof(Ipp32f));
Ipp32f* pSrc = ippsMalloc_32f(len*sizeof(Ipp32f));
Ipp32f* FIRDst = ippsMalloc_32f(len*sizeof(Ipp32f));
Ipp32f* pDL = ippsMalloc_32f(tapslen*sizeof(Ipp32f));
ippsZero_32f(pDL,tapslen);
// COMPUTES TAPSLEN COEFFICIENTS FOR BANDPASS FIR FILTER..
ippsFIRGenBandpass_64f( LowFreq, HighFreq, taps, tapslen, ippWinHamming, ippTrue);
ippsConvert_64f32f(taps,taps_32f,tapslen);
// INITIALIZE FIR FILTER..
ippsFIRInitAlloc_32f(&pState,taps_32f,tapslen, pDL);
BYTE *pImageBuffer;
for(i= 0; i < nTotalPixel; ++i)
{
pImageBuffer = &m_pImageBuffer[i*m_nBuffer];
//GENERATE SOURCE VECTOR
ippsConvert_8u32f(pImageBuffer,pSrc,len);
// FILTER AN INPUT VECTOR
ippsFIR_32f(pSrc, FIRDst, len, pState);
ippsAddC_32f_I(128.,FIRDst,len);
ippsConvert_32f8u_Sfs(FIRDst,pImageBuffer,len,ippRndNear,0);
}
ippsFIRFree_32f(pState);
ippsFree(pSrc);
ippsFree(FIRDst);
ippsFree(taps);
ippsFree(taps_32f);
ippsFree(pDL);
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
IPP internally estimates size of data to process and turn on threading only when it gives performance gain.
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you expect library should launch 8 threads to process 64 elements of data in each?
Are you aware of the cost for thread launch? It takes between 2000 and 3000 processor clocks.
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page