Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
69 Views

Multi-threading support missing for ippiCrossCorrNorm_32f_C1R (Version 2019 Update 5)

I recently made a very late upgrade from IPP 6.1 to IPP 2019.5.281. I found out that the cross correlation API has gotten significantly slower compared to IPP 6.1, from 2-3ms to 5-6ms per run. I checked the ThreadedFunctionsList.txt for the IPP version 2019.5.281 and it appears that the cross correlation API no longer has multi-threading support. This is not a matter of not having threaded libraries installed; I have tested both the single and threaded libraries. Threading actually makes the API slower, 8-9ms.

Has internal multi-threading support really been removed from the cross correlation API? If so, what is the justification? Cross-correlation is a very widely used function, so it seems like an odd decision to make.

0 Kudos
8 Replies
Highlighted
Moderator
69 Views

Kevin, 

Could you give us the input parameters of ippiCrossCorrNorm_32f_C1R? Specifically, we need to know the typical srcRoiSize, dsrRoiSize and algType?

thanks

0 Kudos
Highlighted
Beginner
69 Views

Hello Gennady,

The algType used is the following: (IppEnum)(ippAlgAuto | ippiROISame | ippiNormCoefficient);

The srcRoiSize used in this use case is always width 498, height 498.

There is no dstRoiSize parameter for this function, but there is a tplRoiSize, which in this use case is width 15, height 15.

The same parameters are being used for the IPP 6.1 equivalent function, ippiCrossCorrSame_NormLevel_32f_C1R, although in IPP 6.1 there is no algType parameter since that appears to be hardcoded inside the API. 

Please let me know if the above is sufficient information to debug, or if more information is needed. Thanks.

0 Kudos
Highlighted
Beginner
69 Views

Also, to clarify the runtime results I was getting from testing the cross correlation API in IPP 6.1 and 2019 Update 5:

 

Using single thread, IPP 6.1 and 2019 Update 5 run at the same speed of 5-6ms.

When multi-threading, in this case using 4 threads, IPP 6.1 takes 2-3ms, and 2019 Update 5 takes 8-9ms.

0 Kudos
Highlighted
Moderator
69 Views

thanks, Kevin.

as I have learned from ipp experts that since 9.0 legacy version of IPP, the internal OpenMP threading has been removed from these functions. Therefore you could try to use legacy90packages or submit the feature request to add ippTL implementation for ippiCrossCorrNorm.

0 Kudos
Highlighted
New Contributor I
69 Views

How is a tiled implementation ever possible for fast normalized cross-correlation ?

Regards,

Adriaan van Os

 

0 Kudos
Highlighted
Beginner
69 Views

Gennady F. (Blackbelt) wrote:

thanks, Kevin.

as I have learned from ipp experts that since 9.0 legacy version of IPP, the internal OpenMP threading has been removed from these functions. Therefore you could try to use legacy90packages or submit the feature request to add ippTL implementation for ippiCrossCorrNorm.

Thank you for the response, Gennady. 

Can you elaborate on how the reasoning behind Intel's choice to discontinue the OpenMP threading support in the cross correlation function? We have a specific application that requires it to run fast in a linear sequence.

How can I go about submitting a feature request? And is that request able to be added in this version of IPP (2019 update 5), or will it be scoped for a later release?

0 Kudos
Highlighted
Moderator
69 Views

Kevin, please go to the Intel Online Service Center which is the official support channel and submit the Feature Request. If the feature would be re-implement then it, probably, would be into the next versions of IPP. the latest version is 2020.

0 Kudos
Highlighted
New Contributor I
69 Views

The removal of multi-threading support in IPP is a Never Ending Soap Story. I wonder why Intel sells multi-core processors .....

In Apple's vImage framework, you simply pass kvImageDoNotTile https://developer.apple.com/documentation/accelerate/1578976-processing_flags/kvimagedonottile?langu... as a flag if you don't want internal multi-threading.

Sincerely,

Adriaan van Os

 

0 Kudos