Time performance - ippiCrossCorrNorm_32f_C1R

Bogdan_B_1 · ‎09-03-2018

Hi,

I have compared the time performance of cross-correlation (with normalized coefficients) in Ipp 7.0 vs Ipp 2018. In my test, the old version is 2x - 3x faster. I have disabled the hyper-threding from Bios. I have tried to use versions for different processors (y8, e9, I9).

VTune (trial) shows that Ipp 2018 version of cross-correlation has no multi-threading compared to Ipp 7.0.

Is there something I'm missing? The only difference between the 2 tests is the call for cross-correlation.

My processor is an Intel Core i7 4790.

---------------------------------------

Test #1: Ipp 2018

Lib info:

targetCpu: I9

Name: "ippCV AVX2 (I9)"

Version: "2018.0.3 (r58644)"

BuildDate: "Apr 7 2018"

Code:

CGenericImage imInput; // 2048 x 2048, 8-bit image loaded from hdd

CGenericImage imInput_32f; // input image converted to Ipp32f

CGenericImage imTemplate_32f; // generated gaussian template, 9x9

CGenericImage imOutput_32f; // score image

IppiSize szImage = { imInput.m_nWidth, imInput.m_nHeight };

IppiSize szTemplate = { imTemplate_32f.m_nWidth, imTemplate_32f.m_nHeight };

st |= ippiConvert_8u32f_C1R((Ipp8u*)imInput.m_pData, imInput.m_nStep, (Ipp32f*)imInput_32f.m_pData, imInput_32f.m_nStep, szImage);

Ipp8u* pBuffer = NULL;

int nBufferSize = 0;

st |= ippiCrossCorrNormGetBufferSize(szImage, szTemplate, algType, &nBufferSize);

pBuffer = ippsMalloc_8u(nBufferSize);

IppEnum algType = (IppEnum)(ippAlgAuto | ippiROISame | ippiNormCoefficient);

st |= ippiCrossCorrNorm_32f_C1R((Ipp32f*)imInput_32f.m_pData, imInput_32f.m_nStep, szImage

, (Ipp32f*)imTemplate_32f.m_pData, imTemplate_32f.m_nStep, szTemplate

, (Ipp32f*)imOutput_32f.m_pData, imOutput_32f.m_nStep, algType, pBuffer);

---------------------------------------

Test #2: Ipp 7.0

Lib info:

targetCpu: e9

Name: "ippcve9-7.0.dll"

Version: "7.0 build 250.85"

BuildDate: "Nov 27 2011"

Code:

CGenericImage imInput; // 2048 x 2048, 8-bit image loaded from hdd

CGenericImage imInput_32f; // input image converted to Ipp32f

CGenericImage imTemplate_32f; // generated gaussian template, 9x9

CGenericImage imOutput_32f; // score image

IppiSize szImage = { imInput.m_nWidth, imInput.m_nHeight };

IppiSize szTemplate = { imTemplate_32f.m_nWidth, imTemplate_32f.m_nHeight };

st = ippiConvert_8u32f_C1R((Ipp8u*)imInput.m_pData, imInput.m_nStep, (Ipp32f*)imInput_32f.m_pData, imInput_32f.m_nStep, szImage);

st = ippiCrossCorrSame_NormLevel_32f_C1R((Ipp32f*)imInput_32f.m_pData, imInput_32f.m_nStep, szImage

, (Ipp32f*)imTemplate_32f.m_pData, imTemplate_32f.m_nStep, szTemplate

, (Ipp32f*)imOutput_32f.m_pData, imOutput_32f.m_nStep);

---------------------------------------

Jing_Xu · ‎09-04-2018

Hi,

May I know how did you link your program against IPP?

Did you link the program against multi-threading version of 2018?

Bogdan_B_1 · ‎09-04-2018

Hi,

I have solved the "mystery".

"Intel IPP 8.0 continues the process of deprecating threading inside Intel IPP functions that was started in version 7.1. Though not installed by default, the threaded libraries can be installed so code written with these libraries will still work as before. However, moving to external threading is recommended."

It's funny how I was able to find this information only after this topic was created :)

Jing_Xu · ‎09-05-2018

Hi,

Bravo.

Good to hear that.