- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
The IPP documentation says that the function ippsFIRLMSOne_Direct adapts the taps with the classic LMS formula:
w_i(n+1) = w_i(n) + 2 * mu * err * x(n - i)
However, I've noticed that the adaptation process uses another parameter which decreases over time. I've done something like this:
while(there is data to filter) {
tap = pTaps[tapsLen - 1];
ippsFIRLMSOne_DirectQ15_16s(src, refVal, &pDstVal,
pTaps, tapsLen, muQ15, pDlyLine, &pDlyLineIndex);
adapt = (pTaps[tapsLen - 1] - tap) / (src * (refVal - pDstVal));
}
(pTaps[tapsLen - 1] - tap) is 2 * mu * err * x(n)
(refVal - pDstVal) is err
src is x(n)
so adapt should be constant and equal to 2 * mu, right?
Wrong! with mu = 1, adapt starts at about 3.21 and decreases over time until it's close to zero.
That doesn't sound like the "classic" LMS algorithm to me! Can someone explain what's going on? Any help would be greatly appreciated!
Thanks,
Oliver
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Customer,
Could you please submit this issue into Intel Premier Support at https://premier.intel.com where our product support team will provide assistance?
Could you please submit this issue into Intel Premier Support at https://premier.intel.com where our product support team will provide assistance?
Thanks,
Ying S
Intel Corp.
Ying S
Intel Corp.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, there is answer from our expert:
Code:
Reference is below. Probably you forgot about Q15 suffix meaning? All calculations are done in fixed point. #define FXP (15) #define HLF (1<<(FXP-1)) IppStatus ippsFIRLMSOne_DirectQ15_16s( Ipp16s src, Ipp16s ref, Ipp16s* pDstVal, Ipp32s* pTaps, int tapsLen, int muQ15, Ipp16s* pDlyLine, int* pDlyIndex ) { { int n; Ipp64s tmp, nrm; pDlyLine[*pDlyIndex] = pDlyLine[*pDlyIndex+tapsLen] = src; (*pDlyIndex)++; if( *pDlyIndex >= tapsLen ) *pDlyIndex = 0; pDlyLine += *pDlyIndex; tmp = nrm = 0; for( n=0; n* pDlyLine ); nrm += ( pDlyLine * pDlyLine ); } tmp = (tmp + (HLF-1) + ((tmp>>FXP)&1)) >> FXP; *pDstVal = (Ipp16s)CLIP( tmp, IPP_MIN_16S, IPP_MAX_16S ); if( 0 != nrm ) { tmp = muQ15 * ( ref - tmp ); for( n=0; n + tmp * pDlyLine / nrm; pTaps = (int)CLIP( tap, IPP_MIN_32S, IPP_MAX_32S ); } } return ippStsNoErr; } }
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your reply. I still have a few questions, though. I hope you or someone else can help me some more.
Is this code the actual IPP implementation? Or "just" some code you wrote to emulate the IPP function?
In this code, the taps are not in reversed order, right?
Finally, could you explain the following line of code to me:
tmp = (tmp + (HLF-1) + ((tmp>>FXP)&1)) >> FXP;
Is this a Q15 conversion? Couldn't you do something simpler, like tmp = tmp * 1<<15, followed by clipping?
Thanks again for your help!
Sincerely,
Oliver
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is actual general (PX) code implementation. Of course optimized version is different (with the same result). Taps should be in inverse order this fact is mentioned in the manual and in ipps.h header file:
Code:
/* /////////////////////////////////////////////////////////////////////////// // Names: ippsFIRLMSOne_Direct // Purpose: direct form of a FIR LMS filter. One point operation. // Parameters: // src source signal sample // refval desired signal sample // pTapsInv FIR taps coefficient values to be fitted // tapsLen number of the taps // pDlyLine pointer to the delay line values // pDlyIndex pointer to the current index of delay line // mu adaptation step // muQ15 adaptation step, integer version // muQ15 = (int)(mu * (1<<15) + 0.5f) // pDstVal where write output sample to // Return: // ippStsNullPtrErr pointer the the data is null // ippStsSizeErr the taps length is equal or less zero // ippStsNoErr otherwise // Note: adaptation error value has been deleted from the parameter // list because it can be computed as (refval - dst). // taps array is inverted, delay line is of double size = tapsLen * 2 */
The line of code youve asked about is rounding to the nearest even with applying Q15 shift. We provide rounding to the nearest even number (as it is realized in hardware for example for cvtps2dq instruction by default) for all related functions.
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thank you for your reply, it has been most helpful. However, I've noticed that in the code you provide, you do the normalization inside the loop:
Code:
tmp = muQ15 * ( ref - tmp ); for( n=0; n+ tmp * pDlyLine / nrm; pTaps = (int)CLIP( tap, IPP_MIN_32S, IPP_MAX_32S ); }
As I'm sure you're aware, division is the most CPU-expensive operation in this function. It would be more efficient to do something like:
Code:
tmp = muQ15 * ( ref - tmp ) / nrm; for( n=0; n+ tmp * pDlyLine ; pTaps = (int)CLIP( tap, IPP_MIN_32S, IPP_MAX_32S ); }
So there is only one division instead of tapsLen. But the problem with this method is that tmp gets too small (often 0) and as you can imagine, the performances aren't quite as good...
So my question is: can you (or anyone) think of a way to do the division only once, without too much impact on performances? I know my question isn't directly related to the IPP, but I'm working on both Intel and non-Intel architecture, and I want my code to be as efficient as possible in both cases (specifically, with the normalization inside the loop, the code is too slow for the realtime application I'm working on).
Regards,
Oliver
Message Edited by thales_oliver on 05-24-2005 06:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
On IA architecture division is replaced with multiplication by 1/nrm in floating point format. If you dont have opportunity to use floating point, you should calculate tmp/nrm in Q (Q15 or Q31) format outside the loop and then use shift inside the loop.
Regards,
Vladimir
![](/skins/images/45C6C2D737ED71F4C51F0145C8CB1E9C/responsive_peak/images/icon_anonymous_message.png)
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page