Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

vslsConvExecX performance

Beckett__Tony
Beginner
858 Views

Using this function vslsConvExecX verses the IPP function IppFilter,. the performance is 10x slower. Does this seem correct?

 

0 Kudos
6 Replies
Ying_H_Intel
Employee
858 Views

Hi Tony

​Thank you a lot for reporting the problem.
if it is possible, could you please tell some background, like your test cpu type, vector size etc. how do you link MKL and IPP etc?  one small reproduce case may helpful!  If it is private, could you please submit those information to  Intel online service center:  http://supporttickets.intel.com/

Thanks
Ying

0 Kudos
Beckett__Tony
Beginner
858 Views
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 142
model name	: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
stepping	: 9
cpu MHz		: 2904.004
cache size	: 4096 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx
 rdrand hypervisor lahf_lm abm 3dnowprefetch avx2 rdseed clflushopt
bogomips	: 5808.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual

#define IPP_VERSION_STR "2018.0.3"

#define INTEL_MKL_VERSION 20180002

    libmkl_intel_lp64.so => /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f986c843000)
    libmkl_gnu_thread.so => /opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so (0x00007f986b130000)
    libmkl_core.so => /opt/intel/mkl/lib/intel64/libmkl_core.so (0x00007f9867126000)

libippcore.so => /opt/intel/ipp/lib/intel64/libippcore.so (0x00007f529092b000)
    libippcc.so => /opt/intel/ipp/lib/intel64/libippcc.so (0x00007f5290710000)
    libippch.so => /opt/intel/ipp/lib/intel64/libippch.so (0x00007f529050a000)
    libippcv.so => /opt/intel/ipp/lib/intel64/libippcv.so (0x00007f52902e4000)
    libippdc.so => /opt/intel/ipp/lib/intel64/libippdc.so (0x00007f52900dc000)
    libippi.so => /opt/intel/ipp/lib/intel64/libippi.so (0x00007f528fe2a000)
    libipps.so => /opt/intel/ipp/lib/intel64/libipps.so (0x00007f528fbe0000)
    libippvm.so => /opt/intel/ipp/lib/intel64/libippvm.so (0x00007f528f9c9000)

 

 

partial code
 
    const int x_stride[2] = { 256,    1 };             
    const int y_stride[2] = {  8, 1 };                 
    const int z_stride[2] = { 256,    1 };    

status = vslsConvNewTaskX(&task,   
                                                VSL_CONV_MODE_AUTO, 
                                        
                                             ? VSL_CONV_MODE_DIRECT                 
                                            
                                         2, 
                                         x_shape,                                    
                                         y_shape,                                     
                                         z_shape,                                     
                                         x,                                         
                                         x_stride);                                  
     
    const int conv_start[2] = { (anchor.y == -1) ? (y_shape[0] - 1) / 2 : anchor.y,    
                                (anchor.x == -1) ? (y_shape[1] - 1) / 2 : anchor.x }; 
                                                                                  
    status = vslConvSetStart(task, conv_start);                                    
    
    status = vslsConvExecX(task,      
                                      y,                                           
                                      y_stride,                                     
                                      z,                                             
                                      z_stride);                                      
     
    status = vslConvDeleteTask(&task);     

 

 

 

 

 

0 Kudos
Ying_H_Intel
Employee
858 Views

Hi Tony, 

What is your input and  how was your IPP filter parameter? 

Best Regards,

Ying 

0 Kudos
Ying_H_Intel
Employee
858 Views

Hi Tony,

​We discussed the issue internally.  As you saw,  that there are two convolution in MKL, IPP and IPP have better performance than the vslsConvExecX.  And we even have one popular library MKL-DNN for convolution : https://github.com/intel/mkl-dnn. So we are interested in how and what kind of application you are working, could you tell some background?

​Best Regards,
​Ying

0 Kudos
Beckett__Tony
Beginner
858 Views

We are doing image analysis. Currently we are using Linux as the OS. We can compile using either OpenCV or MKL/IPP . In this case for the 2D filter function the OpenCV is 30% faster and we thought that the Intel libraries should be faster. So we are confused.

You are saying that for a 8x8 kernel on 1024x1024 the IPP should be faster?

0 Kudos
Ying_H_Intel
Employee
858 Views

Hi Tony, 

Yes, IPP conv is faster than the functions of  vslsConvExecX.  and what do you mean the openCV is 30% faster?  I supposed OpenCV is optimized by IPP by default. ?  could you please provide us a small test case? 

I attached one we did for IPP test. 

Best Regards,
Ying 

 

int main(void)
{
        double time;
        clock_t t;
    IppStatus status = ippStsNoErr;
    Ipp32f* pSrc1 = NULL, *pSrc2 = NULL, *pDst = NULL; /* Pointers to source/destination images */
    int srcStep1 = 0, srcStep2 = 0, dstStep = 0;       /* Steps, in bytes, through the source/destination images */
    IppiSize dstSize  = { 1031, 1031 };     /* Size of destination ROI in pixels */
    IppiSize src1Size = { 1024, 1024 };     /* Size of destination ROI in pixels */
    IppiSize src2Size = { 8, 8 }; /* Size of destination ROI in pixels */
    int divisor = 2; /* The integer value by which the computed result is divided */
    Ipp8u *pBuffer = NULL;  /* Pointer to the work buffer */
    int iTmpBufSize = 0;    /* Common work buffer size */
    int numChannels = 1;
    IppEnum funCfgFull = (IppEnum)(ippAlgAuto | ippiROIFull | ippiNormNone);

    pSrc2 = ippiMalloc_32f_C1(src2Size.width, src2Size.height, &srcStep2);
    pSrc1 = ippiMalloc_32f_C1(src1Size.width, src1Size.height, &srcStep1);
    pDst  = ippiMalloc_32f_C1(dstSize.width, dstSize.height, &dstStep);

    check_sts( status = ippiConvGetBufferSize(src1Size, src2Size, ipp32f, numChannels, funCfgFull, &iTmpBufSize) )

    pBuffer = ippsMalloc_8u(iTmpBufSize);

        for (int i = 0; i < 1048576; ++i) {
                pSrc1 = 1;
        }
        for (int i = 0; i < 8 * 8; ++i) {
                pSrc2 = 1;
        }
        t = clock();
        for (int j = 0; j < 100; ++j) {
                check_sts(status = ippiConv_32f_C1R(pSrc1, srcStep1, src1Size, pSrc2, srcStep2, src2Size, pDst, dstStep, funCfgFull, pBuffer))
        }
        t = clock() - t;
        time = (double)t / CLOCKS_PER_SEC;
        printf("%f \n", time);
        system("pause");

        return 0;

0 Kudos
Reply