Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Filter with single diagonal kernel

Dongkyu
Beginner
380 Views

 Hello,

 I've been filtering with 8 single width directional kernels.(0, 45, 90,..., 270 and 315 degrees)

 For horizontal and vertical kernels I use FilterRow and FilterColumn.
 But for diagonal directions there's no filter for single width kernel like FilterRow and FilterColumn. 

 So, I've been using kernels like below for diagonal directions.

 0 0 0 0 k4
 0 0 0 k3 0
 0 0 k2 0 0
 0 k1 0 0 0
 k0 0 0 0 0

 Filtering with these kernels is much slower than single row or column filtering.

 How can I boost up the speed for filtering with single width diagonal kernels?
 Any good idea?

 Thanks & regards.

 Dongkyu.

0 Kudos
1 Solution
Igor_A_Intel
Employee
380 Views

Hello Dongkyu,

It is impossible to support special optimizations for all possible kinds of kernels with some distribution of zeroes. For you particular case I see at least 3 solutions: (1) rotate image, then perform filtering with row or column filter, then rotate it back (guess it will be slower than direct filtering with 2D kernel); (2) - try to use the simple C-loop and Intel compiler - it has very good vectorizer and can generate you very fast code; (3) use roi.width buffer and several IPP function calls in a loop:

ippsMulC_32f(row0,k0,dst,roi.width);

ippsMulC_32f(row1+1,k1,buffer,roi.width);

ippsAdd_32f(dst,buffer,dst,roi.width);

ippsMulC_32f(row2+2,k2,buffer,roi.width);

ippsAdd_32f(dst,buffer,dst,roi.width);

.................. etc.

regards, Igor

View solution in original post

0 Kudos
3 Replies
Igor_A_Intel
Employee
381 Views

Hello Dongkyu,

It is impossible to support special optimizations for all possible kinds of kernels with some distribution of zeroes. For you particular case I see at least 3 solutions: (1) rotate image, then perform filtering with row or column filter, then rotate it back (guess it will be slower than direct filtering with 2D kernel); (2) - try to use the simple C-loop and Intel compiler - it has very good vectorizer and can generate you very fast code; (3) use roi.width buffer and several IPP function calls in a loop:

ippsMulC_32f(row0,k0,dst,roi.width);

ippsMulC_32f(row1+1,k1,buffer,roi.width);

ippsAdd_32f(dst,buffer,dst,roi.width);

ippsMulC_32f(row2+2,k2,buffer,roi.width);

ippsAdd_32f(dst,buffer,dst,roi.width);

.................. etc.

regards, Igor

0 Kudos
Igor_A_Intel
Employee
380 Views

PS there is one great function for this purpose (I mean case #3):

IPPAPI(IppStatus, ippsAddProductC_32f,       ( const Ipp32f* pSrc, const Ipp32f val, Ipp32f* pSrcDst, int len ))
 

0 Kudos
Dongkyu
Beginner
380 Views

 Hi, Igor

 Thanks for reply.

 I'm gonna try #2.

 Regards, Dongkyu.

0 Kudos
Reply