- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
I recently made a very late upgrade from IPP 6.1 to IPP 2019.5.281. I found out that the cross correlation API has gotten significantly slower compared to IPP 6.1, from 2-3ms to 5-6ms per run. I checked the ThreadedFunctionsList.txt for the IPP version 2019.5.281 and it appears that the cross correlation API no longer has multi-threading support. This is not a matter of not having threaded libraries installed; I have tested both the single and threaded libraries. Threading actually makes the API slower, 8-9ms.
Has internal multi-threading support really been removed from the cross correlation API? If so, what is the justification? Cross-correlation is a very widely used function, so it seems like an odd decision to make.
- タグ:
- Development Tools
- General Support
- Intel® Integrated Performance Primitives
- Parallel Computing
- Vectorization
コピーされたリンク
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Kevin,
Could you give us the input parameters of ippiCrossCorrNorm_32f_C1R? Specifically, we need to know the typical srcRoiSize, dsrRoiSize and algType?
thanks
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Hello Gennady,
The algType used is the following: (IppEnum)(ippAlgAuto | ippiROISame | ippiNormCoefficient);
The srcRoiSize used in this use case is always width 498, height 498.
There is no dstRoiSize parameter for this function, but there is a tplRoiSize, which in this use case is width 15, height 15.
The same parameters are being used for the IPP 6.1 equivalent function, ippiCrossCorrSame_NormLevel_32f_C1R, although in IPP 6.1 there is no algType parameter since that appears to be hardcoded inside the API.
Please let me know if the above is sufficient information to debug, or if more information is needed. Thanks.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Also, to clarify the runtime results I was getting from testing the cross correlation API in IPP 6.1 and 2019 Update 5:
Using single thread, IPP 6.1 and 2019 Update 5 run at the same speed of 5-6ms.
When multi-threading, in this case using 4 threads, IPP 6.1 takes 2-3ms, and 2019 Update 5 takes 8-9ms.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
thanks, Kevin.
as I have learned from ipp experts that since 9.0 legacy version of IPP, the internal OpenMP threading has been removed from these functions. Therefore you could try to use legacy90packages or submit the feature request to add ippTL implementation for ippiCrossCorrNorm.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
How is a tiled implementation ever possible for fast normalized cross-correlation ?
Regards,
Adriaan van Os
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Gennady F. (Blackbelt) wrote:thanks, Kevin.
as I have learned from ipp experts that since 9.0 legacy version of IPP, the internal OpenMP threading has been removed from these functions. Therefore you could try to use legacy90packages or submit the feature request to add ippTL implementation for ippiCrossCorrNorm.
Thank you for the response, Gennady.
Can you elaborate on how the reasoning behind Intel's choice to discontinue the OpenMP threading support in the cross correlation function? We have a specific application that requires it to run fast in a linear sequence.
How can I go about submitting a feature request? And is that request able to be added in this version of IPP (2019 update 5), or will it be scoped for a later release?
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Kevin, please go to the Intel Online Service Center which is the official support channel and submit the Feature Request. If the feature would be re-implement then it, probably, would be into the next versions of IPP. the latest version is 2020.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
The removal of multi-threading support in IPP is a Never Ending Soap Story. I wonder why Intel sells multi-core processors .....
In Apple's vImage framework, you simply pass kvImageDoNotTile https://developer.apple.com/documentation/accelerate/1578976-processing_flags/kvimagedonottile?language=objc as a flag if you don't want internal multi-threading.
Sincerely,
Adriaan van Os
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
I also found the threaded ipp call ippiCrossCorrNorm_8u32f_C1R_T being *sometimes* slower than the call to the non-parallel version.
A colleague thinks, observing core load, that sometimes the ipp is not using multi-core. But the implementation of parallel version using only 1 core seems to be slower than the non-parallel version.
An example of very bad parallel performance is an image ROI of 400x400, and a pattern of 90x70 for example (image ROI enclosing all pattern area). The non parallel version takes 5.5ms, the parallel one 8.7ms (tbb, 2021.3)
My question to Intel: What is the threshold for multi-core or single core processing? Is it possible to query in advance ? Can´t you route, in case no multi-cores are used, just to the default call?
Regards
Stefan
