Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6704 Discussions

Resize by Super interpolation by ippi 2019 and 2020 is slower than 5.2 (ippu8-5.2.dll)

D12
Beginner
4,350 Views

Hi,

i have an image of 1.5GB which i would like to scale down by Super interpolation, on Intel(R) Xeon(R) CPU E5-2460 v3 @ 2.66Ghz (2 processors) 32 cores and memory of 128 GB.

Using Intel 2019 and 2020, i see that the more threads i use, the slower it takes to scale down using Super interpolation. While testing it on ippiu8-5.2.dll (i don't which Intel it is...) i get faster performance when i use more threads. The problem doesn't exist for Cubic interpolation. It works as expected.

The sample below shows the performance time to scale down image of 1.5GB with Super interpolation by factor of 0.27 using different number of threads. Each case was tested 3 times:

Using Intel 2019 and 2020:

threads = 4,   time=842 ms,     threads = 4,   time=670 ms,       threads = 4, time=655 ms
threads = 8,   time=718 ms,     threads = 8,   time=718 ms,       threads = 8, time=749 ms
threads = 16, time=967 ms,     threads = 16, time=920 ms,      threads = 16, time=921 ms
threads = 24, time=1201 ms,   threads = 24, time=1092 ms,   threads = 24, time=1170 ms

.Using old version of Intel (ippu8-5.2.dll):

threads = 4, time=1092 ms,    threads = 4, time=1123 ms,     threads = 4, time=1092 ms
threads = 8, time=577 ms,      threads = 8, time=562 ms,        threads = 8, time=562 ms
threads = 16, time=375 ms,    threads = 16, time=375 ms,     threads = 16, time=374 ms
threads = 24, time=249 ms,    threads = 24, time=249 ms,     threads = 24, time=265 ms

 

Any solution to get better results when using more threads in Intel 2019 and 2020 for Resize Super Interpolation?

0 Kudos
48 Replies
D12
Beginner
1,486 Views

Hi Vlad,

I tested your modification (that split the image to parts) on different image size and found something weird.

see in below table the processing time in milliseconds for 1,4,8,16,and 24 threads running on two images with different rows.

I expected that the processing time for the smaller image will be faster than the bigger one. But i results are opposite and not as expected.

What could be the reason and how to solve it?

 

  1 4 8 16 24
image size 1.62GB (30720x18924) 656 188 94 93 109
image size 1.09GB (30720x12682) 500 234 188 218 360
0 Kudos
Vladislav_V_Intel
1,473 Views

Hi Dudi,

I'm looking into this case. Will notify you when results are available.

Best regards,
Vlad V.

0 Kudos
D12
Beginner
1,441 Views

Hi Vlad,

Did you manage to reproduce the problem?

0 Kudos
Vladislav_V_Intel
1,434 Views

Hi Dudi,

Unfortunately I haven't reproduced your issue yet. In my environment I see that the smaller image is processed a little bit faster than the bigger one with new parallelization scheme. But I saw the same behavior (bigger is faster) for the initial scheme. Could you please double-check that the both images processed with the same parameters with new scheme?

Best regards,
Vlad V.

0 Kudos
D12
Beginner
1,424 Views

Hi Vlad,

all tests executed your code of CImage::ResizeMod(...). I always get faster performance for the bigger image. Is there any option to let you connect to my computer by TeamViewer to show you the problem? Or, where can i upload the images for you so you will test them?

 

0 Kudos
Gennady_F_Intel
Moderator
1,411 Views

Dudi,

if you like the official support and you have a valid license, then you may submit the issue to the official Intel Support Center -

https://supporttickets.intel.com/servicecenter?lang=en-US

There You may open tickets and upload all your images. If I am not mistaken, this system has a 2Gb Size Limitation.

or you may upload all of your images to the external resources ( DropBox, as an example) and share the link.


0 Kudos
D12
Beginner
1,388 Views

Hi Vlad,

see shared link to download 2 images that i tested

https://danaher-my.sharepoint.com/:u:/r/personal/dudi_avramov_esko_com/Documents/Shared%20with%20Everyone/samples.7z?csf=1&web=1&e=pi0RmC

the scale-down for the bigger one is faster than the smaller one. I expect to be vice versa...

 

0 Kudos
Vladislav_V_Intel
1,383 Views

Hi Dudi,

Unfortunately, the link you shared goes to the private sharepoint that requires authentication and we have no access to it.

Best regards,
Vlad V.

0 Kudos
D12
Beginner
1,380 Views

Vlad,

could you please write your e-mail address so i will send by WeTransfer?

 

0 Kudos
Vladislav_V_Intel
1,363 Views

Dudi,
It is highly recommended to transfer data through Intel Support Center https://supporttickets.intel.com/servicecenter?lang=en-US or through open channels that don't require sharing of confidential information.

Best regards,
Vlad V.

0 Kudos
D12
Beginner
1,354 Views

Even though signing-in, I don't understand how to transfer data through Intel Support Center...

0 Kudos
Vladislav_V_Intel
1,350 Views

Request Support -> Check "A product or service I already own or use" + "Search for a product or service by name" -> type "Integrated" and chose Integrated Performance Primitives -> Answer the questions and shortly describe question, push "Next: Details" button -> Fill the form and "Submit Request". After submitting the request you'll be able to upload large files up to 2Gb

Vladislav_V_Intel_0-1614267402941.png

 

0 Kudos
D12
Beginner
1,338 Views

Hi Vlad,

I uploaded 2 images according to your advise.

You can find them by support request number: 04986331

0 Kudos
Gennady_F_Intel
Moderator
1,330 Views

Dudi,

could you please try to upload these images once again, as we see no images uploaded to those online service center thread?

 

0 Kudos
D12
Beginner
1,321 Views

Hi Vlad,

i uploaded again the images. So you can find them in support request number: 04986331

0 Kudos
Vladislav_V_Intel
1,315 Views

Hi Dudi,

I was able to reproduce your behavior when big image processed faster than small one with your images which sizes are 30720x18284 and 30720x12682. Now I'm looking for what can cause such behavior.

Best regards,
Vlad V.

0 Kudos
D12
Beginner
1,309 Views

Hi Vlad,

thanks for the information. Reproducing the problem is 50% of the solution

 

Regards,

Dudi

0 Kudos
D12
Beginner
1,257 Views

Hi Vald,

i hope you could find out the reason for the slow down.

do you have any update?

0 Kudos
Vladislav_V_Intel
1,251 Views

Hi Dudi,

Unfortunately, the reason is still not clear. There are much more L3 cache misses for smaller picture in your example within 'parallel_for' loop and I'm looking what can cause such behavior.

Best regards,
Vlad V. 

0 Kudos
D12
Beginner
1,271 Views

Hi Vlad,

do you have any solution in mind to solve this issue?

0 Kudos
Vladislav_V_Intel
1,269 Views

Hi Dudi,

Yes, the problem again is in the amount of additional buffer size. Because of the calculations of how many rows to process at a time, for small image it is required more memory and competition for L3 cache happens with larger number of threads. The solution can be to split image not only by rows but also by columns to reduce overall memory load for processing one picture piece. Now I'm testing it on your example code and will notify you about results when they're ready.

Best regards,
Vlad V.   

0 Kudos
Reply