Solved: OpenCV 2.2 built with IPP v7.0 is not showing much improvement

starseedsoft · ‎01-26-2011

I have a project which in which most of the time is spent inside the OpenCV function cvCalcPCA, followed by cvProjectPCA.

I am only seeing a 3% improvement using the OpenCV libraries build with IPP compared with OpenCV libraries built without IPP>

Does this make sense? I would have thought there would be greater speed up.

/K

Vladimir_Dudnik · ‎01-26-2011

Hi,

in the latest OpenCV SVN trunk we have added call to IPP for Sobel and Sharr filters and we see significant speedup up to 6X on SandyBridge platform comparing with original version of OpenCV.

I do not recall if any IPP function is used in cvCalcPCA operation. It something you may find useful to look at and probably add IPP call if there are any appropriate functions exists. (In other words - we did not anilyze cvCalcPCA operation yet:) ).

Regards,
Vladimir

View solution in original post

Vladimir_Dudnik · ‎01-26-2011

Hi,

in the latest OpenCV SVN trunk we have added call to IPP for Sobel and Sharr filters and we see significant speedup up to 6X on SandyBridge platform comparing with original version of OpenCV.

I do not recall if any IPP function is used in cvCalcPCA operation. It something you may find useful to look at and probably add IPP call if there are any appropriate functions exists. (In other words - we did not anilyze cvCalcPCA operation yet:) ).

Regards,
Vladimir

starseedsoft · ‎01-27-2011

Forgive my ignorance, but can you tell me what is involved in "add IPP call if ..." ?

Do you have a whitepaper or some kind of reference as to how to do so?

Vladimir_Dudnik · ‎01-28-2011

Hi,

we do not have a whitepaper on this yet, but the idea is perfect, we will need to think to develop one.

Although, you should understand that there is no formal template on how to substitute piece of application with IPP functions, it always should be considered on case by case base. The complexity may vary from very simple cases and up to more complicated ones. Let's consider for exampledot product operation implemented in C/C++. You may find it useful to substitute C++ loop which implements that dot product operation with call to IPP DotProd function. The benefit there is that IPP API will remain the same for every new generation of Intel hardware, and even with five years from now all what you will need to do to get the advantages from the latest and greatest processor technologies (does not matter whatever it will be) you will need just relink your application with the latest versionof IPP libraries. There is no need to study low level architecture details to get the best performance and you may focus you energy on improving the application itself rather than worry about optimization of every performance critical loops.

For real example, please take a look at opencv\modules\core\src\matmul.cpp file, function

[bash]/****************************************************************************************
 * Dot Product * 
****************************************************************************************/
template static double
dotprod_( const Mat& srcmat1, const Mat& srcmat2 )
{
    const T* src1 = (const T*)srcmat1.data;
    const T* src2 = (const T*)srcmat2.data;
    size_t step1 = srcmat1.step/sizeof(src1[0]);
    size_t step2 = srcmat2.step/sizeof(src2[0]);
    ST sum = 0;

    Size size = getContinuousSize( srcmat1, srcmat2, srcmat1.channels() );

    if( size.width == 1 )
    {
        WT t = 0;
        for( ; size.height--; src1 += step1, src2 += step2 )
            t += (WT)src1[0]*src2[0];
        sum += t;
    }
    else
    {
        for( ; size.height--; src1 += step1, src2 += step2 )
        {
            int i;
            WT t = 0;
            for( i = 0; i <= size.width - 4; i += 4 )
            {
                sum += (WT)src1*src2 +
                            (WT)src1[i+1]*src2[i+1] +
                            (WT)src1[i+2]*src2[i+2] +
                            (WT)src1[i+3]*src2[i+3];
            }
            for( ; i < size.width; i++ )
                t += (WT)src1*src2; sum += t;
        }
    }
    return (double)sum;
} [/bash]

once we recognize that the operation implemented in plain C++ is 2D dot product operation we can find appropriate IPP function to substitute, from ippi.h file (and IPP reference manual for image Dot Product operations)

[bash]/* /////////////////////////////////////////////////////////////////////////////
// Dot product of two images
///////////////////////////////////////////////////////////////////////////// */
/* /////////////////////////////////////////////////////////////////////////////
// Name: ippiDotProd
// Purpose: Computes the dot product of two images
// Context:
// Returns: IppStatus
// ippStsNoErr OK
// ippStsNullPtrErr One of the pointers is NULL
// ippStsStepErr One of the step values is equal to zero
// Parameters:
// pSrc1 Pointer to the first source image.
// src1Step Step in bytes through the first source image
// pSrc2 Pointer to the second source image.
// src2Step Step in bytes through the source image
// roiSize Size of the source image ROI.
// pDp Pointer to the result (one-channel data) or array (multi-channel data) containing computed dot products
//       of channel values of pixels in the source images.
// hint Option to select the algorithmic implementation of the function
// Notes:
*/

 IPPAPI(IppStatus, ippiDotProd_8u64f_C1R,(const Ipp8u* pSrc1, int src1Step, const Ipp8u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f *pDp))
 IPPAPI(IppStatus, ippiDotProd_8s64f_C1R,(const Ipp8s* pSrc1, int src1Step, const Ipp8s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f *pDp))
 IPPAPI(IppStatus, ippiDotProd_16u64f_C1R,(const Ipp16u* pSrc1, int src1Step, const Ipp16u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f *pDp))
 IPPAPI(IppStatus, ippiDotProd_16s64f_C1R,(const Ipp16s* pSrc1, int src1Step, const Ipp16s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f *pDp))
 IPPAPI(IppStatus, ippiDotProd_32u64f_C1R,(const Ipp32u* pSrc1, int src1Step, const Ipp32u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f *pDp))
 IPPAPI(IppStatus, ippiDotProd_32s64f_C1R,(const Ipp32s* pSrc1, int src1Step, const Ipp32s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f *pDp)) 
 IPPAPI(IppStatus, ippiDotProd_32f64f_C1R,(const Ipp32f* pSrc1, int src1Step, const Ipp32f* pSrc2, int src2Step, IppiSize roiSize, Ipp64f *pDp, IppHintAlgorithm hint)) 
 IPPAPI(IppStatus, ippiDotProd_8u64f_C3R,(const Ipp8u* pSrc1, int src1Step, const Ipp8u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3]))
 IPPAPI(IppStatus, ippiDotProd_8s64f_C3R,(const Ipp8s* pSrc1, int src1Step, const Ipp8s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3]))
 IPPAPI(IppStatus, ippiDotProd_16u64f_C3R,(const Ipp16u* pSrc1, int src1Step, const Ipp16u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3]))
 IPPAPI(IppStatus, ippiDotProd_16s64f_C3R,(const Ipp16s* pSrc1, int src1Step, const Ipp16s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3])) 
 IPPAPI(IppStatus, ippiDotProd_32u64f_C3R,(const Ipp32u* pSrc1, int src1Step, const Ipp32u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3]))
 IPPAPI(IppStatus, ippiDotProd_32s64f_C3R,(const Ipp32s* pSrc1, int src1Step, const Ipp32s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3])) 
 IPPAPI(IppStatus, ippiDotProd_32f64f_C3R,(const Ipp32f* pSrc1, int src1Step, const Ipp32f* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3], IppHintAlgorithm hint))
 IPPAPI(IppStatus, ippiDotProd_8u64f_C4R,(const Ipp8u* pSrc1, int src1Step, const Ipp8u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[4]))
 IPPAPI(IppStatus, ippiDotProd_8s64f_C4R,(const Ipp8s* pSrc1, int src1Step, const Ipp8s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[4]))
 IPPAPI(IppStatus, ippiDotProd_16u64f_C4R,(const Ipp16u* pSrc1, int src1Step, const Ipp16u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[4]))
 IPPAPI(IppStatus, ippiDotProd_16s64f_C4R,(const Ipp16s* pSrc1, int src1Step, const Ipp16s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[4]))
 IPPAPI(IppStatus, ippiDotProd_32u64f_C4R,(const Ipp32u* pSrc1, int src1Step, const Ipp32u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[4]))
 IPPAPI(IppStatus, ippiDotProd_32s64f_C4R,(const Ipp32s* pSrc1, int src1Step, const Ipp32s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[4]))
 IPPAPI(IppStatus, ippiDotProd_32f64f_C4R,(const Ipp32f* pSrc1, int src1Step, const Ipp32f* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[4], IppHintAlgorithm hint))
 IPPAPI(IppStatus, ippiDotProd_8u64f_AC4R,(const Ipp8u* pSrc1, int src1Step, const Ipp8u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3]))
 IPPAPI(IppStatus, ippiDotProd_8s64f_AC4R,(const Ipp8s* pSrc1, int src1Step, const Ipp8s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3])) 
 IPPAPI(IppStatus, ippiDotProd_16u64f_AC4R,(const Ipp16u* pSrc1, int src1Step, const Ipp16u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3]))
 IPPAPI(IppStatus, ippiDotProd_16s64f_AC4R,(const Ipp16s* pSrc1, int src1Step, const Ipp16s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3]))
 IPPAPI(IppStatus, ippiDotProd_32u64f_AC4R,(const Ipp32u* pSrc1, int src1Step, const Ipp32u* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3]))
 IPPAPI(IppStatus, ippiDotProd_32s64f_AC4R,(const Ipp32s* pSrc1, int src1Step, const Ipp32s* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3]))
 IPPAPI(IppStatus, ippiDotProd_32f64f_AC4R,(const Ipp32f* pSrc1, int src1Step, const Ipp32f* pSrc2, int src2Step, IppiSize roiSize, Ipp64f pDp[3], IppHintAlgorithm hint)) [/bash]

and so the whole C++ loop might be easely suubstituted by single call to one of IPP functions. Yes, C++ implementation support computation in several data types (from unsigned char and up to double). To cover this with IPP functions one will need to switch by data type in Mat src and call appropriate IPP function. I will left this not finished for a while, but you may find implementation quite straightforward and easy to finish on your side.

Regards,
Vladimir

Ying_H_Intel · ‎01-29-2011

Hello,

Here is one techinal notes Using Intel IPP with OpenCV 2.1 for your reference.

Not sure if OpenCV 2.2 have some improvement on intergrating IPP.
Seeing from the paper,

2.2 Using the OpenCV library with build-in Intel IPP.

As we mentioned in IPP Support Model Changed in OpenCV 2.1

OpenCV 2.1.0 integrates Intel IPP functions when the Macro "HAVE_IPP" is defined during the process of building cxcore and cv libraries. The built-in IPP functions include

l Color Conversion in cv =>cvcolor.cpp
l Harrclassifier training functions in cv => cvharr.cpp
l ippsDFT function in cxcore=>cxdft.cpp

(Please note, the early IPP version included far more IPP functions than 2.1.0, please check the list in __cvipp.h).

So the IPP functions in OpenCV mainly focus onDFT/DCTtransformand face recognition, which include the Feature Detection and HaarClassifier. cvCalcPCA may not call any IPP routines directly.

To adding the IPP routines in OpenCV code, you may follow the Vadimir's suggest, to replace same functionality in OpenCV code with IPP function.

The articleUsing Intel IPP with OpenCV 2.1also show oneway tocall IPP function directlyin an OpenCV application.

Regards,
Ying

starseedsoft · ‎02-10-2011

Thank you Vladimir, this is very good information.

Looking at your example, i realize that i know nothing about how this works. THe code you show forippiDotProd appears to be one giant macro, declaring all possible function invocation types. Is this correct?

starseedsoft · ‎02-19-2011

Ying;

thank you very much for the information, it is very helpful.

I have unfortunately not understood how to use the 'star rating system' and inadvertantly gave you only a 'one star' rating. I meant to give you a five star -rating, but the system will not let me change that. Is there any way to reset the star rating?

PaulF_IntelCorp · ‎02-25-2011

Not sure if there's a way to override the rating, but if you get a few of your friends to give it a five star it will move the results up. :-) It's a voting system.

OpenCV 2.2 built with IPP v7.0 is not showing much improvement over plain OpenCV 2.2