topic Sorry to necrobump, but I ran in Intel® Integrated Performance Primitives

ippiDilateBorder_16u_C1R performance regression in 2018

adam_s_ — Thu, 08 Feb 2018 19:03:41 GMT

Hello,

The latest community version of IPP has a 4x performance regression in the ippiDilateBorder_16u_C1R function for largish neighborhoods. A sized 221x221 neighborhood in our use case seems to be affected (with an image size of 7002x8998), though I'm sure it's measurable for smaller neighboorhoods as well. I've seen this regression in Windows, haven't tested it in Linux, yet. This is while using a Haswell CPU. I'm not sure how much it matters, but the neighborhood is defined as 1 for all values.

Hi Adam.

Andrey_B_Intel — Thu, 22 Feb 2018 11:26:08 GMT

Hi Adam.

Could you please send ippcvGetLibVersion output of both versions?

Thanks.

I currently don't have the

adam_s_ — Thu, 22 Feb 2018 23:05:00 GMT

I currently don't have the original version, my binary was statically linked to it, but I believe the version that didn't have the regression was 2017 (with the latest update). The version that does is 2018 (both the initial version and the update). This was specifically in Windows - though I imagine the regression may exist on other platforms.

Sorry to necrobump, but I ran

adam_s_ — Mon, 12 Aug 2019 14:52:01 GMT

Sorry to necrobump, but I ran into this today and the performance regression exists in versions as late as 2018 (haven't checked anything newer). Watching this in a loop with perf, it would seem that the max filter routine optimized for SSE variants of the architecture (l9_ownFilterMaxRowVH_16u_C1R and l9_ownFilterMaxColumnVH_16u_C1R) are orders of magnitude faster when the kernel size is large enough (1706x1706 kernel with 3709x5527 dimensioned input).

Any idea what's going on? Is there maybe a way I can use newer IPP but force it to use these older versions to get around this regression?

So doing a tiny bit of

adam_s_ — Tue, 13 Aug 2019 17:54:06 GMT

So doing a tiny bit of research and speculation on my part, I'm assuming the "VH" in those function names signify that function is performing the Van Herk algorithm (as in Van Herk/Gil-Werman). Also somewhat surprisingly, the straight MaxFilterBorder calls do a pretty naive approach to computing the max filter instead of the fast l9_ownFilterMaxRowVH_16u_C1R routines called by dilation in the IPP 9.0.3 of yore. Why did you guys rip out these functions and why weren't they called in the MaxFilterBorder functions to begin with? Are they patent encumbered? I have half a mind to attempt to implement these myself with SIMD intrinsics, but IPP already seems to have them there in earlier versions, so it seems like I'm needlessly reinventing the wheel.