Re: Resize by Super interpolation by ippi 2019 and 2020 is slower than 5.2 (ippu8-5.2.dll) - 2ページ

D12 · ‎10-10-2020

Hi,

i have an image of 1.5GB which i would like to scale down by Super interpolation, on Intel(R) Xeon(R) CPU E5-2460 v3 @ 2.66Ghz (2 processors) 32 cores and memory of 128 GB.

Using Intel 2019 and 2020, i see that the more threads i use, the slower it takes to scale down using Super interpolation. While testing it on ippiu8-5.2.dll (i don't which Intel it is...) i get faster performance when i use more threads. The problem doesn't exist for Cubic interpolation. It works as expected.

The sample below shows the performance time to scale down image of 1.5GB with Super interpolation by factor of 0.27 using different number of threads. Each case was tested 3 times:

Using Intel 2019 and 2020:

threads = 4, time=842 ms, threads = 4, time=670 ms, threads = 4, time=655 ms
threads = 8, time=718 ms, threads = 8, time=718 ms, threads = 8, time=749 ms
threads = 16, time=967 ms, threads = 16, time=920 ms, threads = 16, time=921 ms
threads = 24, time=1201 ms, threads = 24, time=1092 ms, threads = 24, time=1170 ms

.Using old version of Intel (ippu8-5.2.dll):

threads = 4, time=1092 ms, threads = 4, time=1123 ms, threads = 4, time=1092 ms
threads = 8, time=577 ms, threads = 8, time=562 ms, threads = 8, time=562 ms
threads = 16, time=375 ms, threads = 16, time=375 ms, threads = 16, time=374 ms
threads = 24, time=249 ms, threads = 24, time=249 ms, threads = 24, time=265 ms

Any solution to get better results when using more threads in Intel 2019 and 2020 for Resize Super Interpolation?

D12 · ‎02-09-2021

Hi Vlad,

I tested your modification (that split the image to parts) on different image size and found something weird.

see in below table the processing time in milliseconds for 1,4,8,16,and 24 threads running on two images with different rows.

I expected that the processing time for the smaller image will be faster than the bigger one. But i results are opposite and not as expected.

What could be the reason and how to solve it?

	1	4	8	16	24
image size 1.62GB (30720x18924)	656	188	94	93	109
image size 1.09GB (30720x12682)	500	234	188	218	360

Vladislav_V_Intel · ‎02-10-2021

Hi Dudi,

I'm looking into this case. Will notify you when results are available.

Best regards,
Vlad V.

D12 · ‎02-17-2021

Hi Vlad,

Did you manage to reproduce the problem?

Vladislav_V_Intel · ‎02-17-2021

Hi Dudi,

Unfortunately I haven't reproduced your issue yet. In my environment I see that the smaller image is processed a little bit faster than the bigger one with new parallelization scheme. But I saw the same behavior (bigger is faster) for the initial scheme. Could you please double-check that the both images processed with the same parameters with new scheme?

Best regards,
Vlad V.

D12 · ‎02-17-2021

Hi Vlad,

all tests executed your code of CImage::ResizeMod(...). I always get faster performance for the bigger image. Is there any option to let you connect to my computer by TeamViewer to show you the problem? Or, where can i upload the images for you so you will test them?

Gennady_F_Intel · ‎02-17-2021

Dudi,

if you like the official support and you have a valid license, then you may submit the issue to the official Intel Support Center -

https://supporttickets.intel.com/servicecenter?lang=en-US

There You may open tickets and upload all your images. If I am not mistaken, this system has a 2Gb Size Limitation.

or you may upload all of your images to the external resources ( DropBox, as an example) and share the link.

D12 · ‎02-24-2021

Hi Vlad,

see shared link to download 2 images that i tested

https://danaher-my.sharepoint.com/:u:/r/personal/dudi_avramov_esko_com/Documents/Shared%20with%20Everyone/samples.7z?csf=1&web=1&e=pi0RmC

the scale-down for the bigger one is faster than the smaller one. I expect to be vice versa...

Vladislav_V_Intel · ‎02-24-2021

Hi Dudi,

Unfortunately, the link you shared goes to the private sharepoint that requires authentication and we have no access to it.

Best regards,
Vlad V.

D12 · ‎02-24-2021

Vlad,

could you please write your e-mail address so i will send by WeTransfer?

Vladislav_V_Intel · ‎02-25-2021

Dudi,
It is highly recommended to transfer data through Intel Support Center https://supporttickets.intel.com/servicecenter?lang=en-US or through open channels that don't require sharing of confidential information.

Best regards,
Vlad V.

D12 · ‎02-25-2021

Even though signing-in, I don't understand how to transfer data through Intel Support Center...

Vladislav_V_Intel · ‎02-25-2021

Request Support -> Check "A product or service I already own or use" + "Search for a product or service by name" -> type "Integrated" and chose Integrated Performance Primitives -> Answer the questions and shortly describe question, push "Next: Details" button -> Fill the form and "Submit Request". After submitting the request you'll be able to upload large files up to 2Gb

D12 · ‎03-01-2021

Hi Vlad,

I uploaded 2 images according to your advise.

You can find them by support request number: 04986331

Gennady_F_Intel · ‎03-01-2021

Dudi,

could you please try to upload these images once again, as we see no images uploaded to those online service center thread?

D12 · ‎03-02-2021

Hi Vlad,

i uploaded again the images. So you can find them in support request number: 04986331

Vladislav_V_Intel · ‎03-03-2021

Hi Dudi,

I was able to reproduce your behavior when big image processed faster than small one with your images which sizes are 30720x18284 and 30720x12682. Now I'm looking for what can cause such behavior.

Best regards,
Vlad V.

D12 · ‎03-03-2021

Hi Vlad,

thanks for the information. Reproducing the problem is 50% of the solution

Regards,

Dudi

D12 · ‎03-15-2021

Hi Vald,

i hope you could find out the reason for the slow down.

do you have any update?

Vladislav_V_Intel · ‎03-15-2021

Hi Dudi,

Unfortunately, the reason is still not clear. There are much more L3 cache misses for smaller picture in your example within 'parallel_for' loop and I'm looking what can cause such behavior.

Best regards,
Vlad V.

D12 · ‎03-24-2021

Hi Vlad,

do you have any solution in mind to solve this issue?

Vladislav_V_Intel · ‎03-24-2021

Hi Dudi,

Yes, the problem again is in the amount of additional buffer size. Because of the calculations of how many rows to process at a time, for small image it is required more memory and competition for L3 cache happens with larger number of threads. The solution can be to split image not only by rows but also by columns to reduce overall memory load for processing one picture piece. Now I'm testing it on your example code and will notify you about results when they're ready.

Best regards,
Vlad V.