Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

ippiConvFull_32f_C1R error in IPP7.0

teng_w_
Beginner
735 Views

Hi, I test ippiConvFull_32f_C1R under VS2010 and IPP 7.0 on my computer. The cpu

is i5-3470 CPU@3.20GHz. 

I find when the kernel size is larger than 10*10, then the result is not correct. the code as follows:

    int   nWidth = 81;
    int   nHeight = 80;
    float *pfsrc = new float[nWidth*nHeight];

    for(int i = 0; i < nWidth*nHeight; i++)
        pfsrc = i;


    int nKWidth = 11;
    int nKHeight = 11;
    float psKernel[200];

    for(int i = 0; i < nKWidth*nKHeight;i++)
    {
        psKernel = i%20;
    }

    int nDstW = nWidth + nKWidth - 1;
    int nDstH = nHeight + nKHeight - 1;

    float *psDst = new float[(nDstW)*(nDstH)];
    float *psDst2 = new float[(nDstW)*(nDstH)];

    IppiSize srcSize = {nWidth, nHeight};
    IppiSize kernelSize = {nKWidth, nKHeight};
    ippiConvFull_32f_C1R(pfsrc, nWidth*4, srcSize, psKernel, nKWidth*4, kernelSize, psDst, nDstW*4);

Does the code or ippiConvFull_32f_C1R have problem? By the way, I find some information text in include file about this function.

//  Purpose: Performs the VALID 2-D convolution of matrices (images).
//           If IppiSize's of matrices (images) are Wa*Ha and Wb*Hb
//           correspondingly, then the IppiSize of the resulting matrix
//           (image) will be (|Wa-Wb|+1)*(|Ha-Hb|+1).
//           If the smallest image IppiSize > CRITERION, then convolution
//           is done using 2D FFT.

Maybe the problem is related to 2D FFT? 

Thank you!

 

0 Kudos
1 Solution
Igor_A_Intel
Employee
735 Views

Hi Teng,

this is not a bug - accuracy of ippiConv function via 2D FFT is ~5e-5 relative to input data range that is ~6400 in your case. Yes, this old API has internal criterion for switching from direct to 2D FFT method. Please try new 2D convolution API from IPP 9.0 (that is available for free with community license - see sticky items at the main forum page) - it provides you opportunity to choose manually direct, 2D FFT or auto methods:

/* ////////////////////////////////////////////////////////////////////////////
//   Names: ippiConv_32f_C1R, ippiConv_32f_C3R, ippiConv_32f_C4R
//          ippiConv_16s_C1R, ippiConv_16s_C3R, ippiConv_16s_C4R
//          ippiConv_8u_C1R,  ippiConv_8u_C3R,  ippiConv_8u_C4R
//  Purpose: Performs full or valid 2-D convolution of two images.
//           The result image size depends on operation shape selected in algType mask as follows:
//             (Wa+Wb-1)*(Ha+Hb-1) for ippiROIFull mask
//             (Wa-Wb+1)*(Ha-Hb+1) for ippiROIValid mask,
//           where Wa*Ha and Wb*Hb are the sizes of the image and template, respectively.
//          If the IppAlgMask value in algType is equal to ippAlgAuto, the optimal algorithm is selected
//          automatically. For big data size, the function uses 2D FFT algorithm.
//  Parameters:
//    pSrc1, pSrc2       - Pointers to the source images ROI.
//    src1Step, src2Step - Distances, in bytes, between the starting points of consecutive lines in the source images.
//    src1Size, src2Size - Size, in pixels, of the source images.
//    pDst               - Pointer to the destination image ROI.
//    dstStep            - Distance, in bytes, between the starting points of consecutive lines in the destination image.
//    divisor            - The integer value by which the computed result is divided (for operations on integer data only).
//    algType            - Bit-field mask for the algorithm type definition. Possible values are the results of composition of the IppAlgType and IppiROIShape values.
//                          Usage example: algType=(ippiROIFull|ippAlgFFT); - full-shaped convolution will be calculated using 2D FFT.
//    pBuffer            - Pointer to the buffer for internal calculations.
 

regards, Igor

View solution in original post

0 Kudos
6 Replies
Igor_A_Intel
Employee
735 Views

Hi Teng,

Your report is not complete.

1) "is not correct" - could you provide your expected result to compare with and difference you see with the IPP function?

2) also you should provide an output from IPPAPI( const IppLibraryVersion*, ippiGetLibVersion, (void) ) function - to understand which cpu specific code works, ia32 or x64, static or dynamic, single or multi threaded.

3) IPP 9.0 version is available for free download - do you see the same issue with the latest IPP version?

regards, Igor
 

0 Kudos
teng_w_
Beginner
735 Views

Hi Igor

1)  For first 10 data, the expected result is 0.000000,0.000000,1.000000,4.000000,10.000000,20.000000,35.000000,56.000000,84.0
00000,120.000000, but the result of ippiConvFull_32f_C1R is 0.108215,0.105957,1.107361,4.114563,10.125793,20.130188,35.132202,56.139587,84.1
55640,120.171509, so the result is different. The input is integer, so the result can't be float. I guass it is related to 2D FFT.

2) IppLibraryVersion returned by ippiGetLibVersion as follows:

   major:7

    minor:0

    majorBuild:205

   build:1063

   targetCpu: y8

    Name: ippiy8-7.0.dll+

   Version: 7.0 build 205.68

   BuildDate: Jul 20 2011

3) Sorry, I have not try the IPP9.0.

 

0 Kudos
Igor_A_Intel
Employee
736 Views

Hi Teng,

this is not a bug - accuracy of ippiConv function via 2D FFT is ~5e-5 relative to input data range that is ~6400 in your case. Yes, this old API has internal criterion for switching from direct to 2D FFT method. Please try new 2D convolution API from IPP 9.0 (that is available for free with community license - see sticky items at the main forum page) - it provides you opportunity to choose manually direct, 2D FFT or auto methods:

/* ////////////////////////////////////////////////////////////////////////////
//   Names: ippiConv_32f_C1R, ippiConv_32f_C3R, ippiConv_32f_C4R
//          ippiConv_16s_C1R, ippiConv_16s_C3R, ippiConv_16s_C4R
//          ippiConv_8u_C1R,  ippiConv_8u_C3R,  ippiConv_8u_C4R
//  Purpose: Performs full or valid 2-D convolution of two images.
//           The result image size depends on operation shape selected in algType mask as follows:
//             (Wa+Wb-1)*(Ha+Hb-1) for ippiROIFull mask
//             (Wa-Wb+1)*(Ha-Hb+1) for ippiROIValid mask,
//           where Wa*Ha and Wb*Hb are the sizes of the image and template, respectively.
//          If the IppAlgMask value in algType is equal to ippAlgAuto, the optimal algorithm is selected
//          automatically. For big data size, the function uses 2D FFT algorithm.
//  Parameters:
//    pSrc1, pSrc2       - Pointers to the source images ROI.
//    src1Step, src2Step - Distances, in bytes, between the starting points of consecutive lines in the source images.
//    src1Size, src2Size - Size, in pixels, of the source images.
//    pDst               - Pointer to the destination image ROI.
//    dstStep            - Distance, in bytes, between the starting points of consecutive lines in the destination image.
//    divisor            - The integer value by which the computed result is divided (for operations on integer data only).
//    algType            - Bit-field mask for the algorithm type definition. Possible values are the results of composition of the IppAlgType and IppiROIShape values.
//                          Usage example: algType=(ippiROIFull|ippAlgFFT); - full-shaped convolution will be calculated using 2D FFT.
//    pBuffer            - Pointer to the buffer for internal calculations.
 

regards, Igor

0 Kudos
teng_w_
Beginner
735 Views

Hi Igor

      Thanks for your replay.

      By the way, you said that the old API has internal criterion for switching from direct to 2D FFT method.  So what's the criterion?

      

0 Kudos
Igor_A_Intel
Employee
735 Views

Hi Teng,

internal criterion is rather complex and depends on CPU and architecture. For example for your particular case (y8 lib) it is the next:

/* if Wa*Ha/(Wb*Hb) > _CONVFULL_BLOCK_ALG, then blocking FFT alg. is used */
#define _CONVFULL_BLOCK_ALG 7

/* if  Wb*Hb < _CONVFULL_MAX_DIR_KERNEL, then direct alg. is used */
#define _CONVFULL_MAX_DIR_KERNEL 11 * 11

/* below this order the double FFT size is more effective */
#define _DOUBLE_SIZE 8

#define _CONVFULL_32F_START_FFTUSE 63 * 47
#define _CONVFULL_8u_START_FFTUSE  83 * 59
#define _CONVFULL_16s_START_FFTUSE 75 * 59

therefore for 32f data 2D FFT based algorithm is used if srcSize > 63*47 and kernel >= 11*11

regards, Igor

0 Kudos
teng_w_
Beginner
735 Views

Hi Igor

            Thanks for your explanation. Many thanks.

 

0 Kudos
Reply