- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I've been filtering with 8 single width directional kernels.(0, 45, 90,..., 270 and 315 degrees)
For horizontal and vertical kernels I use FilterRow and FilterColumn.
But for diagonal directions there's no filter for single width kernel like FilterRow and FilterColumn.
So, I've been using kernels like below for diagonal directions.
0 0 0 0 k4
0 0 0 k3 0
0 0 k2 0 0
0 k1 0 0 0
k0 0 0 0 0
Filtering with these kernels is much slower than single row or column filtering.
How can I boost up the speed for filtering with single width diagonal kernels?
Any good idea?
Thanks & regards.
Dongkyu.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Dongkyu,
It is impossible to support special optimizations for all possible kinds of kernels with some distribution of zeroes. For you particular case I see at least 3 solutions: (1) rotate image, then perform filtering with row or column filter, then rotate it back (guess it will be slower than direct filtering with 2D kernel); (2) - try to use the simple C-loop and Intel compiler - it has very good vectorizer and can generate you very fast code; (3) use roi.width buffer and several IPP function calls in a loop:
ippsMulC_32f(row0,k0,dst,roi.width);
ippsMulC_32f(row1+1,k1,buffer,roi.width);
ippsAdd_32f(dst,buffer,dst,roi.width);
ippsMulC_32f(row2+2,k2,buffer,roi.width);
ippsAdd_32f(dst,buffer,dst,roi.width);
.................. etc.
regards, Igor
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Dongkyu,
It is impossible to support special optimizations for all possible kinds of kernels with some distribution of zeroes. For you particular case I see at least 3 solutions: (1) rotate image, then perform filtering with row or column filter, then rotate it back (guess it will be slower than direct filtering with 2D kernel); (2) - try to use the simple C-loop and Intel compiler - it has very good vectorizer and can generate you very fast code; (3) use roi.width buffer and several IPP function calls in a loop:
ippsMulC_32f(row0,k0,dst,roi.width);
ippsMulC_32f(row1+1,k1,buffer,roi.width);
ippsAdd_32f(dst,buffer,dst,roi.width);
ippsMulC_32f(row2+2,k2,buffer,roi.width);
ippsAdd_32f(dst,buffer,dst,roi.width);
.................. etc.
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
PS there is one great function for this purpose (I mean case #3):
IPPAPI(IppStatus, ippsAddProductC_32f, ( const Ipp32f* pSrc, const Ipp32f val, Ipp32f* pSrcDst, int len ))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Igor
Thanks for reply.
I'm gonna try #2.
Regards, Dongkyu.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page