topic Re: LMS FIR filter in Intel® Integrated Performance Primitives

LMS FIR filter

thales_oliver — Wed, 23 Mar 2005 21:04:14 GMT

Hello,

The IPP documentation says that the function ippsFIRLMSOne_Direct adapts the taps with the classic LMS formula:

w_i(n+1) = w_i(n) + 2 * mu * err * x(n - i)

However, I've noticed that the adaptation process uses another parameter which decreases over time. I've done something like this:

while(there is data to filter) {

tap = pTaps[tapsLen - 1];

ippsFIRLMSOne_DirectQ15_16s(src, refVal, &pDstVal,

pTaps, tapsLen, muQ15, pDlyLine, &pDlyLineIndex);

adapt = (pTaps[tapsLen - 1] - tap) / (src * (refVal - pDstVal));

}

(pTaps[tapsLen - 1] - tap) is 2 * mu * err * x(n)

(refVal - pDstVal) is err

src is x(n)

so adapt should be constant and equal to 2 * mu, right?

Wrong! with mu = 1, adapt starts at about 3.21 and decreases over time until it's close to zero.

That doesn't sound like the "classic" LMS algorithm to me! Can someone explain what's going on? Any help would be greatly appreciated!

Thanks,

Oliver

Re: LMS FIR filter

Ying_S_Intel — Sat, 26 Mar 2005 04:19:43 GMT

Dear Customer,
Could you please submit this issue into Intel Premier Support at https://premier.intel.com where our product support team will provide assistance?

Thanks,
Ying S
Intel Corp.

Re: LMS FIR filter

Vladimir_Dudnik — Fri, 01 Apr 2005 05:07:28 GMT

Hi, there is answer from our expert:

Code:

Reference is below. Probably you forgot about Q15 suffix meaning? All calculations are done in fixed point.

#define FXP (15)
#define HLF (1<<(FXP-1))

IppStatus ippsFIRLMSOne_DirectQ15_16s( Ipp16s src, Ipp16s ref,
                        Ipp16s* pDstVal, Ipp32s* pTaps, int tapsLen, int muQ15,
                                            Ipp16s* pDlyLine, int* pDlyIndex )
{
  {
    int n;
    Ipp64s tmp, nrm;

    pDlyLine[*pDlyIndex] = pDlyLine[*pDlyIndex+tapsLen] = src;

    (*pDlyIndex)++;
    if( *pDlyIndex >= tapsLen ) *pDlyIndex = 0;
    pDlyLine += *pDlyIndex;

    tmp = nrm = 0;
    for( n=0; n * pDlyLine );
        nrm += ( pDlyLine * pDlyLine );
    }

    tmp = (tmp + (HLF-1) + ((tmp>>FXP)&1)) >> FXP;
 
    *pDstVal = (Ipp16s)CLIP( tmp, IPP_MIN_16S, IPP_MAX_16S );

    if( 0 != nrm ) {
        tmp = muQ15 * ( ref - tmp );
        for( n=0; n + tmp * pDlyLine / nrm;
            pTaps = (int)CLIP( tap, IPP_MIN_32S, IPP_MAX_32S );
        }
    }
    return ippStsNoErr;
  }
}

Regards,

Vladimir

Re: LMS FIR filter

thales_oliver — Wed, 20 Apr 2005 17:07:24 GMT

Thank you very much for your reply. I still have a few questions, though. I hope you or someone else can help me some more.

Is this code the actual IPP implementation? Or "just" some code you wrote to emulate the IPP function?

In this code, the taps are not in reversed order, right?

Finally, could you explain the following line of code to me:

tmp = (tmp + (HLF-1) + ((tmp>>FXP)&1)) >> FXP;

Is this a Q15 conversion? Couldn't you do something simpler, like tmp = tmp * 1<<15, followed by clipping?

Thanks again for your help!

Sincerely,

Oliver

Re: LMS FIR filter

Vladimir_Dudnik — Wed, 20 Apr 2005 23:10:14 GMT

It is actual general (PX) code implementation. Of course optimized version is different (with the same result). Taps should be in inverse order this fact is mentioned in the manual and in ipps.h header file:

Code:

/* ///////////////////////////////////////////////////////////////////////////
//   Names:     ippsFIRLMSOne_Direct
//   Purpose:   direct form of a FIR LMS filter. One point operation.
//   Parameters:
//      src          source signal sample
//      refval       desired signal sample
//      pTapsInv     FIR taps coefficient values to be fitted
//      tapsLen      number of the taps
//      pDlyLine     pointer to the delay line values
//      pDlyIndex    pointer to the current index of delay line
//      mu           adaptation step
//      muQ15        adaptation step, integer version
//                   muQ15 = (int)(mu * (1<<15) + 0.5f)
//      pDstVal      where write output sample to
//   Return:
//      ippStsNullPtrErr  pointer the the data is null
//      ippStsSizeErr     the taps length is equal or less zero
//      ippStsNoErr       otherwise
//   Note: adaptation error value has been deleted from the parameter
//         list because it can be computed as (refval - dst).
//         taps array is inverted, delay line is of double size = tapsLen * 2
*/

The line of code youve asked about is rounding to the nearest even with applying Q15 shift. We provide rounding to the nearest even number (as it is realized in hardware for example for cvtps2dq instruction by default) for all related functions.

Regards,
Vladimir

Re: LMS FIR filter

thales_oliver — Tue, 24 May 2005 20:36:48 GMT

Hello,

Thank you for your reply, it has been most helpful. However, I've noticed that in the code you provide, you do the normalization inside the loop:

Code:

        tmp = muQ15 * ( ref - tmp );
        for( n=0; n + tmp * pDlyLine / nrm;
            pTaps = (int)CLIP( tap, IPP_MIN_32S, IPP_MAX_32S );
        }

As I'm sure you're aware, division is the most CPU-expensive operation in this function. It would be more efficient to do something like:

Code:

        tmp = muQ15 * ( ref - tmp ) / nrm;
        for( n=0; n + tmp * pDlyLine;
            pTaps = (int)CLIP( tap, IPP_MIN_32S, IPP_MAX_32S );
        }

So there is only one division instead of tapsLen. But the problem with this method is that tmp gets too small (often 0) and as you can imagine, the performances aren't quite as good...

So my question is: can you (or anyone) think of a way to do the division only once, without too much impact on performances? I know my question isn't directly related to the IPP, but I'm working on both Intel and non-Intel architecture, and I want my code to be as efficient as possible in both cases (specifically, with the normalization inside the loop, the code is too slow for the realtime application I'm working on).

Regards,

Oliver

Message Edited by thales_oliver on 05-24-2005 06:39 AM

Re: LMS FIR filter

Vladimir_Dudnik — Mon, 30 May 2005 21:27:44 GMT

Hi,

On IA architecture division is replaced with multiplication by 1/nrm in floating point format. If you dont have opportunity to use floating point, you should calculate tmp/nrm in Q (Q15 or Q31) format outside the loop and then use shift inside the loop.

Regards,

Vladimir