Solved: Filter with single diagonal kernel

Dongkyu · ‎07-10-2015

Hello,

I've been filtering with 8 single width directional kernels.(0, 45, 90,..., 270 and 315 degrees)

For horizontal and vertical kernels I use FilterRow and FilterColumn.
But for diagonal directions there's no filter for single width kernel like FilterRow and FilterColumn.

So, I've been using kernels like below for diagonal directions.

0 0 0 0 k4
0 0 0 k3 0
0 0 k2 0 0
0 k1 0 0 0
k0 0 0 0 0

Filtering with these kernels is much slower than single row or column filtering.

How can I boost up the speed for filtering with single width diagonal kernels?
Any good idea?

Thanks & regards.

Dongkyu.

Igor_A_Intel · ‎07-13-2015

Hello Dongkyu,

It is impossible to support special optimizations for all possible kinds of kernels with some distribution of zeroes. For you particular case I see at least 3 solutions: (1) rotate image, then perform filtering with row or column filter, then rotate it back (guess it will be slower than direct filtering with 2D kernel); (2) - try to use the simple C-loop and Intel compiler - it has very good vectorizer and can generate you very fast code; (3) use roi.width buffer and several IPP function calls in a loop:

ippsMulC_32f(row0,k0,dst,roi.width);

ippsMulC_32f(row1+1,k1,buffer,roi.width);

ippsAdd_32f(dst,buffer,dst,roi.width);

ippsMulC_32f(row2+2,k2,buffer,roi.width);

ippsAdd_32f(dst,buffer,dst,roi.width);

.................. etc.

regards, Igor

View solution in original post

Igor_A_Intel · ‎07-13-2015

Hello Dongkyu,

It is impossible to support special optimizations for all possible kinds of kernels with some distribution of zeroes. For you particular case I see at least 3 solutions: (1) rotate image, then perform filtering with row or column filter, then rotate it back (guess it will be slower than direct filtering with 2D kernel); (2) - try to use the simple C-loop and Intel compiler - it has very good vectorizer and can generate you very fast code; (3) use roi.width buffer and several IPP function calls in a loop:

ippsMulC_32f(row0,k0,dst,roi.width);

ippsMulC_32f(row1+1,k1,buffer,roi.width);

ippsAdd_32f(dst,buffer,dst,roi.width);

ippsMulC_32f(row2+2,k2,buffer,roi.width);

ippsAdd_32f(dst,buffer,dst,roi.width);

.................. etc.

regards, Igor

Igor_A_Intel · ‎07-13-2015

PS there is one great function for this purpose (I mean case #3):

IPPAPI(IppStatus, ippsAddProductC_32f, ( const Ipp32f* pSrc, const Ipp32f val, Ipp32f* pSrcDst, int len ))

Dongkyu · ‎07-13-2015

Hi, Igor

Thanks for reply.

I'm gonna try #2.

Regards, Dongkyu.