It appears this bug has existed for quite a while. If someone could specify to me what the buffer size calculation logic is, I could probably get around it. The recommended function for specifying border behavior for the Dilate filter is ippiDilateBorder_<mod>, where I'm doing this on 16u. When calling: ippiMorphologyBorderGetSize_16u_C1R, pSpecSize comes back as -883236287 when specified with an ROI of 4068x5948 and a mask of 10017x10017, and a borderType of ippBorderRepl.
I noticed there are more morphological buffer size calculation functions, but with no specification for border behavior (ippiMorphGetSpecSize_L). First, will this function account for my border behavior? Second, this is only available in newer versions of IPP, which have a performance regression in the max filter for larger kernels and ROIs (as described in a post I resurrected from earlier today).
If someone can just tell me how this body of code calculates these buffer sizes, I can reproduce this with size_t types and get around this issue entirely.
Hmm, this is likely a degenerate use case anyway, now that I look at it. We're choosing a mask that is > the entire ROI. Constraining to be the same size of the smallest dim of the ROI seems to make this issue never crop up. Though, a filter function with a kernel with a size up to twice the image should be perfectly valid.
IPP morphology is developed in supposition that a structured element (mask) is significantly smaller than image/ROI size. Could you explain the purpose of you use case? Why do you use so huge mask?
Sure - We're using it as a fast max filter. In IPP 9.0.3, you guys internally seem to be leveraging a faster (as in, scale linearly with mask size) algorithm for using a dilation with a structured element of all ones (I suspect the Van Herk algorithm given in many SIMD acceleration related papers). Using strictly the max filters with all versions of IPP, old and new, does not use this and appears to use a naive version (that is perhaps faster for smaller masks?). It seems IPP in 2017 and newer has since removed this variant, so we're getting 400x worse performance for very large masks.
Our code finds localized peaks in a 2D image space. We take the resulting image of this dilation and compare it to the original image, finding the exact pixels that contain these peaks of these localized neighborhoods.
For us, this data is in geospace, so it's huge. We have a GUI tool to allow the user to expand it multiple KM wide radii. When scaled down to pixel space this can be as large as the entire map.
However, it's worth noting, that this bug seems to happen in very large masks but they don't have to be bigger than the ROI. As a matter of fact, I spent the better half of yesterday reverse engineering your buffer calculation code, and found a few things:
- You call l9_ownippiFilterMinGetBufferSize_16u_C1 and l9_ownippiFilterMaxGetBufferSize_16u_C1 and take the max of the two buffer sizes. These two functions are actually identical, so this isn't exactly causing problems, but it seems a bit wasteful.
You then take that function's result, and compare it with the result of l9_ownMorphEllipseGetBufferSize for some reason. Here's where your bug lies. That code overflows pretty early on (with my named ROI, it starts overflowing as early as a radius of 256, meaning a mask size of 513x513). When this happens, it happens in such a way that the result of l9_ownippiFilterMaxGetBufferSize_16u_C1R is used instead (as of course, the overflow makes the result of the ellipse buffer size calculation negative). This still gives a reasonable, signed 32 bit result (though probably incorrectly sized to something smaller than your code intended) until a radius of 1426 (mask size of 2853x2853), where the result of the max of those two operations (with several overflow wrap arounds by this point) starts to overflow when adjusted for SIMD vector lengths after the max is taken. It is at this point that buffer sizes for the "pBufferSize" start becoming negative. Eventually, the same thing starts happening with pSpecBufSize.
So, either you guys are relying on overflow behavior (doubtful), or this bug has been there for a long time and nobody noticed because they never fed large ROIs + large kernels.
as I've already mentioned - IPP morphology was designed long ago in assumption that SE (mask) is rather small (3x3, 5x5,...,11x11). The memory buffer, that is required fort his functionality, is used for 2 purposes: (1) for separable filtering (row-column processing pipeline) and (2) for borders processing. For both these parts the amount of required memory for temporal calculations is highly dependent on the SE size. Yes, it is a bug that this functionality doesn't check the calculated buffer size for overflow. It has been already fixed for functions with "_L" suffix and 64-bit-sizes API. Please use the new Morphology functionality from the Platform-Aware section:
IPPAPI(IppStatus, ippiDilate_16u_C1R_L,(const Ipp16u* pSrc, IppSizeL srcStep, Ipp16u* pDst, IppSizeL dstStep, IppiSizeL roiSize, IppiBorderType borderType, const Ipp16u borderValue, const IppiMorphStateL* pMorphSpec, Ipp8u* pBuffer)) IPPAPI(IppStatus, ippiErode_16u_C1R_L,(const Ipp16u* pSrc, IppSizeL srcStep, Ipp16u* pDst, IppSizeL dstStep, IppiSizeL roiSize, IppiBorderType borderType, const Ipp16u borderValue, const IppiMorphStateL* pMorphSpec, Ipp8u* pBuffer))
(ippcv_l.h header file).
If you need Min/Max filters or Morph for huge masks and VH algorithm support - please submit a request via Intel premier support site.
This _L function is the actual dilation, which isn't where my problem lies. My problem resides in the buffer size calculation function that happens before this (that determines the allocation sizes for pMorphSpec and pBuffer). Is there an _L variant of MorphologyBorderGetSize? Furthermore, when were the _L functions introduced? If it's newer than 9.0.3, than we lose the fast VH filter. Can you elaborate as to why you took something that was linear complexity and removed it for something that's quadratic?
IPPAPI(IppStatus, ippiDilateGetBufferSize_L, (IppiSizeL roiSize, IppiSizeL maskSize, IppDataType datatype, int numChannels, IppSizeL* pBufferSize))
IPPAPI(IppStatus, ippiErodeGetBufferSize_L, (IppiSizeL roiSize, IppiSizeL maskSize, IppDataType datatype, int numChannels, IppSizeL* pBufferSize))
Does this function take into account border behavior? I noticed it also doesn't calculate the pMorphSpec size, which is the value that is overflowing first and foremost.
there are separate functions for SpecSize:
/* /////////////////////////////////////////////////////////////////////////////////////// // Name: ippiDilateGetSpecSize_L, ippiErodeGetSpecSize_L // // // Purpose: Gets the size of the internal state or specification structure for morphological operations. // // Return: // ippStsNoErr Ok. // ippStsNullPtrErr One of the pointers is NULL. // ippStsSizeErr Width of the image, or width or height of the structuring // element is less than,or equal to zero. // // Parameters: // roiSize Size of the source and destination image ROI in pixels. // maskSize Size of the structuring element. // pSpecSize Pointer to the specification structure size. */ IPPAPI(IppStatus, ippiDilateGetSpecSize_L,(IppiSizeL roiSize, IppiSizeL maskSize, IppSizeL* pSpecSize)) IPPAPI(IppStatus, ippiErodeGetSpecSize_L,(IppiSizeL roiSize, IppiSizeL maskSize, IppSizeL* pSpecSize))
PS you are right - VH filter is not implemented in the latest versions; as I've already mentioned - IPP implementation is intended for rather small masks, and VH makes sense only for rather big masks/SEs; why it is not supported anymore? - due to some reasons - the same as for all other deprecated and removed functionalities - there was a number of notifications/articles/messages on that.