- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using this function vslsConvExecX verses the IPP function IppFilter,. the performance is 10x slower. Does this seem correct?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tony
Thank you a lot for reporting the problem.
if it is possible, could you please tell some background, like your test cpu type, vector size etc. how do you link MKL and IPP etc? one small reproduce case may helpful! If it is private, could you please submit those information to Intel online service center: http://supporttickets.intel.com/
Thanks
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 142 model name : Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz stepping : 9 cpu MHz : 2904.004 cache size : 4096 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch avx2 rdseed clflushopt bogomips : 5808.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual
#define IPP_VERSION_STR "2018.0.3"
#define INTEL_MKL_VERSION 20180002
libmkl_intel_lp64.so => /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f986c843000)
libmkl_gnu_thread.so => /opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so (0x00007f986b130000)
libmkl_core.so => /opt/intel/mkl/lib/intel64/libmkl_core.so (0x00007f9867126000)
libippcore.so => /opt/intel/ipp/lib/intel64/libippcore.so (0x00007f529092b000)
libippcc.so => /opt/intel/ipp/lib/intel64/libippcc.so (0x00007f5290710000)
libippch.so => /opt/intel/ipp/lib/intel64/libippch.so (0x00007f529050a000)
libippcv.so => /opt/intel/ipp/lib/intel64/libippcv.so (0x00007f52902e4000)
libippdc.so => /opt/intel/ipp/lib/intel64/libippdc.so (0x00007f52900dc000)
libippi.so => /opt/intel/ipp/lib/intel64/libippi.so (0x00007f528fe2a000)
libipps.so => /opt/intel/ipp/lib/intel64/libipps.so (0x00007f528fbe0000)
libippvm.so => /opt/intel/ipp/lib/intel64/libippvm.so (0x00007f528f9c9000)
partial code const int x_stride[2] = { 256, 1 }; const int y_stride[2] = { 8, 1 }; const int z_stride[2] = { 256, 1 }; status = vslsConvNewTaskX(&task, VSL_CONV_MODE_AUTO, ? VSL_CONV_MODE_DIRECT 2, x_shape, y_shape, z_shape, x, x_stride); const int conv_start[2] = { (anchor.y == -1) ? (y_shape[0] - 1) / 2 : anchor.y, (anchor.x == -1) ? (y_shape[1] - 1) / 2 : anchor.x }; status = vslConvSetStart(task, conv_start); status = vslsConvExecX(task, y, y_stride, z, z_stride); status = vslConvDeleteTask(&task);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tony,
What is your input and how was your IPP filter parameter?
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tony,
We discussed the issue internally. As you saw, that there are two convolution in MKL, IPP and IPP have better performance than the vslsConvExecX. And we even have one popular library MKL-DNN for convolution : https://github.com/intel/mkl-dnn. So we are interested in how and what kind of application you are working, could you tell some background?
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are doing image analysis. Currently we are using Linux as the OS. We can compile using either OpenCV or MKL/IPP . In this case for the 2D filter function the OpenCV is 30% faster and we thought that the Intel libraries should be faster. So we are confused.
You are saying that for a 8x8 kernel on 1024x1024 the IPP should be faster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tony,
Yes, IPP conv is faster than the functions of vslsConvExecX. and what do you mean the openCV is 30% faster? I supposed OpenCV is optimized by IPP by default. ? could you please provide us a small test case?
I attached one we did for IPP test.
Best Regards,
Ying
int main(void)
{
double time;
clock_t t;
IppStatus status = ippStsNoErr;
Ipp32f* pSrc1 = NULL, *pSrc2 = NULL, *pDst = NULL; /* Pointers to source/destination images */
int srcStep1 = 0, srcStep2 = 0, dstStep = 0; /* Steps, in bytes, through the source/destination images */
IppiSize dstSize = { 1031, 1031 }; /* Size of destination ROI in pixels */
IppiSize src1Size = { 1024, 1024 }; /* Size of destination ROI in pixels */
IppiSize src2Size = { 8, 8 }; /* Size of destination ROI in pixels */
int divisor = 2; /* The integer value by which the computed result is divided */
Ipp8u *pBuffer = NULL; /* Pointer to the work buffer */
int iTmpBufSize = 0; /* Common work buffer size */
int numChannels = 1;
IppEnum funCfgFull = (IppEnum)(ippAlgAuto | ippiROIFull | ippiNormNone);
pSrc2 = ippiMalloc_32f_C1(src2Size.width, src2Size.height, &srcStep2);
pSrc1 = ippiMalloc_32f_C1(src1Size.width, src1Size.height, &srcStep1);
pDst = ippiMalloc_32f_C1(dstSize.width, dstSize.height, &dstStep);
check_sts( status = ippiConvGetBufferSize(src1Size, src2Size, ipp32f, numChannels, funCfgFull, &iTmpBufSize) )
pBuffer = ippsMalloc_8u(iTmpBufSize);
for (int i = 0; i < 1048576; ++i) {
pSrc1 = 1;
}
for (int i = 0; i < 8 * 8; ++i) {
pSrc2 = 1;
}
t = clock();
for (int j = 0; j < 100; ++j) {
check_sts(status = ippiConv_32f_C1R(pSrc1, srcStep1, src1Size, pSrc2, srcStep2, src2Size, pDst, dstStep, funCfgFull, pBuffer))
}
t = clock() - t;
time = (double)t / CLOCKS_PER_SEC;
printf("%f \n", time);
system("pause");
return 0;
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page