- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
currently we are using IPP 5.2 in our application, I try to replace it with IPP 2019 with Nuget package. I don't understand the performance comparison of resize with CUBIC between IPP 5.2 and IPP 2019.
The resize test is that the size of the destination image is (240, 217), one part of the source image will be zoomed to the destination's size.
When one image (60 * 54) is zoomed 4 times, the resize cubic function of IPP 5.2 runs faster than IPP 2019.
When one image (30 * 27) is zoomed 8 times, the resize cubic function of IPP 5.2 runs still faster than IPP 2019. And in this time IPP 2019 itself is also slower than zoomed 4 times using IPP 2019.
My question is that,
Why is IPP 2019 slower than IPP 5.2?
Why is using IPP 2019 zoom 8 times slower than zoom 4 times. When zooming 8 times, the processed image size is only a quarter of the zooming 4 times?
Thank you in advance.
Ning
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
I am a bit confused. I've modified the reproducer. I see at my avx2 systems the following numbers:
Xeon Silver 4116 2.10Ghz
ippIP AVX2 (h9), 2019.0.5 (r0xc95fdf5f)
( 30, 27) -> ( 60, 54), 42903.60, 9273.60
( 30, 27) -> (120, 108), 137363.20, 20176.80
( 90, 81) -> ( 60, 54), 30096.00, 18941.60
( 90, 81) -> (120, 108), 78498.00, 42430.40
Core i5 7300u 2.7Ghz
ippIP AVX2 (h9), 2019.0.5 (r0xc95fdf5f)
( 30, 27) -> ( 60, 54), 82597.00, 19693.80
( 30, 27) -> (120, 108), 474363.80, 57542.40
( 90, 81) -> ( 60, 54), 62416.60, 56303.40
( 90, 81) -> (120, 108), 106889.40, 50775.00
Ning, could you please build this reproducer as separated application and send me output from it?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the CPU type you are running on?
Could you print ippiGetLibVersion() output?
IppLibraryVersion* lib = ippiGetLibVersion();
("\t\t version of IPP is: %s %s %d.%d.%d.%d\n", lib->Name, lib->Version, lib->major, lib->minor, lib->majorBuild, lib->build);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
and what is exact ippiResizeCubic_<mod> do you use?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady F. (Blackbelt) wrote:and what is exact ippiResizeCubic_<mod> do you use?
Hello Gennady,
thank you for your reply, my CPU is Intel Core i7-8700k, the cubic method is ippiResizeCubic_16u_C1R in my project.
The result of "ippiGetLibVersion" is
name : 0x3b834220 "ippIP AVX2 (h9)"
Version : 0x3b834230 "2019.0.4 (r62443)"
major : 2019
minor : 0
majorBuild : 4
build : 62443
Thank you again and looking forward to your reply.
Kind regards,
Ning
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Ning,
thanks, we will investigate it. It will take some time.
Pavel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ning, could you give us the same output when you linked with 5.2 version?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pavel Berdnikov (Intel) wrote:Hello Ning,
thanks, we will investigate it. It will take some time.
Pavel
Thank you Pavel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady F. (Blackbelt) wrote:Ning, could you give us the same output when you linked with 5.2 version?
Hello Gennady,
The version information is
Name : 0x3BBBF2A8 "ippip8-6.0.dll+"
Version : 0x3BBBF280 "6.0 Update 2 build 167.41"
major : 6
minor: 0
majorBuild : 167
build : 692
targetCpu: p8
Furthermore, the target CPU of IPP 2019 is h9.
Regards,
Ning
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ning, We could not see the problem on our side, could you give us the reproducer which we could build and run on our side?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady F. (Blackbelt) wrote:Ning, We could not see the problem on our side, could you give us the reproducer which we could build and run on our side?
Hi Gennady, the ipp is integrated into our application, it is a little bit hard to extract it as a simple reproducer.
In our application, the IPL project which is also from Intel is still used as a bridge between IPP and our application. The IPL project works only with previous version IPP (like IPP 5.2), so for resize I must replace the old ipp function with new implementation.
old IPP resize function in IPL project
ippiResize_16u_C1R((Ipp16u*)pSrc, srcSize, src->widthStep, srcRoi,
(Ipp16u*)pDst, dst->widthStep, dstRoiSize, xFactor, yFactor, interpolation);
new implementation with Cubic interpolation type
IppiResizeSpec_32f* pSpec = 0; int specSize = 0, initSize = 0, bufSize = 0; Ipp16u borderValue = 0; Ipp8u* pBuffer = 0; Ipp8u* pInitBuf = 0; IppiPoint dstOffset = { 0, 0 }; Ipp8u *pSrc, *pDst; IppiSize srcSize, dstRoiSize; double CubicParameterB = 0.15f; double CubicParameterC = 0.5f; ippiResizeGetSize_16u(srcSize, dstRoiSize, ippCubic, 0, &specSize, &initSize); pInitBuf = ippsMalloc_8u(initSize); pSpec = (IppiResizeSpec_32f*)ippsMalloc_8u(specSize); ippiResizeCubicInit_16u(srcSize, dstRoiSize, CubicParameterB, CubicParameterC, pSpec, pInitBuf); ippiResizeGetBufferSize_8u(pSpec, dstRoiSize,1, &bufSize); pBuffer = ippsMalloc_8u(bufSize); ippiResizeCubic_16u_C1R((Ipp16u*)pSrc, src->widthStep, (Ipp16u*)pDst, dst->widthStep, dstOffset, dstRoiSize, ippBorderRepl, borderValue, pSpec, pBuffer); iplFree(pInitBuf); iplFree(pSpec); iplFree(pBuffer);
Are there anything wrong in my implementation, and this is the only difference inside my performance test. Furthermore, have you also test the performance of resize cubic between IPP 2019 and previous version IPP (before resize change in IPP 7.1)
Thank you very much!
Kind regards,
Ning
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Gennady and Pavel,
I've done another compare test for resizing with cubic, this time I kept the size of source image the same and change the resize factor. The test is still using ippiResizeCubic_16u_C1R with 1000 times repetitions, I attached three test results.
When the size of source image is (30, 27) , small image, the performance of IPP 5 is better than IPP 2019.
When the size of source image is (150, 136), the performance of IPP 5 is almost the same as IPP 2019.
When the size of source image is larger then (150, 136), like the third image with size (480, 517), the speed of IPP 2019 is faster than IPP 5.
From the test result I got, the IPP 2019 is faster when dealing with larger image, but slower when resizing smaller image. Is this because different cubic algorithm is used in the IPP 2019.
For resizing the smaller image (30,27), is the quality of resized image with IPP 2019 better than resized with IPP 5?
Thank you for your help, any suggestions are appreciated!
Kind regards,
Ning
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady F. (Blackbelt) wrote:Ning, We could not see the problem on our side, could you give us the reproducer which we could build and run on our side?
Hi Gennady,
sorry to disturb you, may I ask that if you receive my Email with modified test code?
Thank you and kind regards,
Ning
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ning, yes, the issue with small input sizes is confirmed when the problem sizes too small ( <= ~100 ), in the case of medium and big input sizes, ipp v2019 outperforms the ipp6.0. Checking with AVX, AVX2, and AVX-512 based systems. The issue is escalated and we will keep this thread updated.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady F. (Blackbelt) wrote:Hi Ning, yes, the issue with small input sizes is confirmed when the problem sizes too small ( <= ~100 ), in the case of medium and big input sizes, ipp v2019 outperforms the ipp6.0. Checking with AVX, AVX2, and AVX-512 based systems. The issue is escalated and we will keep this thread updated.
--Gennady
Hello Gennady,
thank you for your reply. Is this because the different cubic interpolation method are applied. And could you please help me to explain what are the differences between this two cubic interpolation methods (I couldn't find much information on the IPP manuel)? Is the new cubic interpolation method has better performance? Is it possible to use old cubic interpolation method when image size is small?
Thank you again.
Kind regards,
Ning
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Ning,
the performance degradation happened because of using more large CPU registers on AVX2 we have benefits on big-enough data, but it affects small data. We will tune the optimization for small data as it is important for you in next IPP releases. I'm sorry for this.
Could you provide any additional information from your side: why processing of such small images is important for you? what are your workloads? Is the resize operation is critical in your pipeline (how many % from whole pipeline it takes?)
Pavel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pavel Berdnikov (Intel) wrote:Hello Ning,
the performance degradation happened because of using more large CPU registers on AVX2 we have benefits on big-enough data, but it affects small data. We will tune the optimization for small data as it is important for you in next IPP releases. I'm sorry for this.
Could you provide any additional information from your side: why processing of such small images is important for you? what are your workloads? Is the resize operation is critical in your pipeline (how many % from whole pipeline it takes?)
Pavel
Hello Pavel,
thank you very much for your reply. Our product is medical image diagnostic software. Our customers are mostly doctors. One of the daily use of our software is to zoom in small series of CT images to diagnose disease. Therefore the performance of zooming is very important for our customers and also for us.
In order to provide excellence user experience of zooming, our product has to guarantee that a series of CT images should be zoomed together and smoothly by moving mouse wheel. The number of zooming operation per mouse moving could be up to 2000 times.
Although it may make not much difference if we are using the recent CPU, some of our customers are still using old PC with relative slow performance.
It would be really great if the performance of resizing small data is improved in next IPP releases.
Thank you very much for your help!
Kind regards,
Ning
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Ning,
I understand your case, thanks. If you have any performance expectations for IPP and data sets for performance measurement and can share this data with us it will be very helpful. We can add the cases in our regular test cycle for better validation.
In any way I will contact with you as soon as we will have new results.
Pavel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pavel Berdnikov (Intel) wrote:Hello Ning,
I understand your case, thanks. If you have any performance expectations for IPP and data sets for performance measurement and can share this data with us it will be very helpful. We can add the cases in our regular test cycle for better validation.
In any way I will contact with you as soon as we will have new results.
Pavel
Hello Pavel,
last month Gennady has sent me a test benchmark example, and I've changed it to compare the performance between IPP V2019 and IPP 6 and have sent back to Gennady, The result of this test benchmark example shows the similar behavior as what we have in our application with medical image data. Would it be helpful?
I'm looking forward to your new results, thank you for all your and Gennady's help!
Kind regards,
Ning
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, thank you. We will use these benchmark.
Pavel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Ning.
When the input image is small the processing of border pixels affects the performance of the ippiResizeCubic_16u more than the previous function. Is it possible in your application to allocate an additional buffer and duplicate border pixels in it? IPP has the necessary API and I am attaching such workaround. I see some speedup at my AVX2 system. Could you please test at your side too?
Andrey.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrey Bakshaev (Intel) wrote:Hello, Ning.
When the input image is small the processing of border pixels affects the performance of the ippiResizeCubic_16u more than the previous function. Is it possible in your application to allocate an additional buffer and duplicate border pixels in it? IPP has the necessary API and I am attaching such workaround. I see some speedup at my AVX2 system. Could you please test at your side too?
Andrey.
Hello Andrey,
thank you very much for your solution, I've modified you sample code inside my test benchmark with the same measure method as before, the code is shown as follow, and I have also test it with my system (AVX2), the performance is similar as before, the result is in the attachment. Have I changed something wrong? The new result in the image is in shown with label IPP 2019 with border InMem ms.
Thank you and kind regards,
Ning
double resize_bench_additional_buffer(int srcw, int srch, int dstw, int dsth) { IppStatus status; IppiResizeSpec_32f* pSpec = 0; IppiInterpolationType interpolation = ippCubic; Ipp8u *pInitBuf, *pBuffer; int specSize, initSize, bufSize, srcStep, dstStep, i, j; IppiPoint dstOffset = { 0, 0 }; Ipp16u valueB = 0.15; Ipp16u valueC = 0.5; Ipp16u *pSrc, *pDst; Ipp16u borderVal[3] = { 0,0,0 }; IppiSize srcSize, dstSize; Ipp16u *pSrc1; int srcStep1; IppiSize srcSize1; srcSize.height = srcw; srcSize.width = srch; dstSize.height = dstw; dstSize.width = dsth; __int64 cycles[2]; double cpe = 0; int n, nloops = 10000; //Resize with ippBorderRepl pSrc = ippiMalloc_16u_C1(srcSize.width, srcSize.height, &srcStep); pDst = ippiMalloc_16u_C1(dstSize.width, dstSize.height, &dstStep); for (i = 0; i < srcSize.height; i++) { for (j = 0; j < srcSize.width; j++) { pSrc[(srcStep >> 1)*i + j] = i + j; } } status = ippiResizeGetSize_16u(srcSize, dstSize, interpolation, 0, &specSize, &initSize); pInitBuf = ippsMalloc_8u(initSize); pSpec = (IppiResizeSpec_32f*)ippsMalloc_8u(specSize); status = ippiResizeCubicInit_16u(srcSize, dstSize, valueB, valueC, pSpec, pInitBuf); status = ippiResizeGetBufferSize_16u(pSpec, dstSize, 1, &bufSize); pBuffer = ippsMalloc_8u(bufSize); IppiBorderSize borderSize; status = ippiResizeGetBorderSize_16u(pSpec, &borderSize); srcSize1.width = borderSize.borderLeft + srcSize.width + borderSize.borderRight; srcSize1.height = borderSize.borderTop + srcSize.height + borderSize.borderBottom; pSrc1 = ippiMalloc_16u_C1(srcSize1.width, srcSize1.height, &srcStep1); ippiCopyReplicateBorder_16u_C1R(pSrc, srcStep, srcSize, pSrc1, srcStep1, srcSize1, borderSize.borderTop, borderSize.borderLeft); status = ippiResizeCubic_16u_C1R(pSrc1 + (srcStep1 >> 1)*borderSize.borderTop + borderSize.borderLeft, srcStep1, pDst, dstStep, dstOffset, dstSize, ippBorderInMem, borderVal, pSpec, pBuffer); Ipp64s t1, t2; t1 = ippGetCpuClocks(); for (n = 0; n < NIMAGES; n++) { status = ippiResizeCubic_16u_C1R(pSrc1 + (srcStep1 >> 1)*borderSize.borderTop + borderSize.borderLeft, srcStep1, pDst, dstStep, dstOffset, dstSize, ippBorderInMem, borderVal, pSpec, pBuffer); } t2 = ippGetCpuClocks(); double execTime = (double)(t2 - t1); int Mhz = 0; ippGetCpuFreqMhz(&Mhz); execTime = execTime / (1.e6*(double)Mhz); printf("... IPP2019 ippiResizeCubic_16u_C1R with border InMem ExecTime == %lf sec, Src Image %d x %d, Dst Image %d x %d ... \n\n", execTime, srcSize.width, srcSize.height, dstSize.width, dstSize.height); return execTime; }
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page