Problem with FIR II core

Altera_Forum · ‎07-09-2018

Hello,

I am trying to use 11 taps symmetrical FIR using the IP Catalog in Quartus 14.1

My data are in signed Q.15 format with sampling rate 288MHz.

I get the data in packets of 4 samples (64bits) on each 72MHz tick

and I want the filter to produce the data in packets of 4 samples quads (64bits)

I have setup the FIR II instance with the following parameters:

-single Rate,

-interpolation and decimation are 1

-one channel

-72MHz clock rate

-288MHz sample rate (so I use TDM and my input and output are 64bits)

-no back preasure

-Symmetrical filter ( 0.005798526799750 -0.000976593987326 0.010162681180614 -0.004699858564008 0.094485468273818 0.731713045004217

0.094485468273818 -0.004699858564008 0.010162681180614 -0.000976593987326 0.005798526799750)

Filter gain is 0.94125

- Coefficients Signed Fractional Binary, 15 fractional bits

- I have imported 11 floating point coefficients (it is strange that the frequency response is not shown,

the figure is scaled out of my curve and the impulse response is shown as 0 line. For now I assume it is a bug in the tool)

- Input option is signed fractional binary with 15 fractional bits and 16 bits in total.

- Output option is signed fractional binary, I truncate 15 LSB bits and 5 MSB bits so I get signed Q.15 as a result.

- The filter is clocked with 72MHz clock (the same clock I get my quads at)

- on the 'valid' input I set constant 1'b1

- On the 'error' signal I set constant 2'b00

I can pass both the input and the output of the filter to the DAC and can monitor the analog signal on the oscilloscope.

Also I can pass some data on the FPGA output pins to look at with the logic analyzer.

What I have checked so far:

1. If I pass constant data to the filter and it responds as I expect in the full Q.15 range at the input. (checked on logic analyzer)

2. I get 'valid' output as constant 1 and I get output 'error' as constant 2'b00

3. If I apply clean 10MHz full scale sine wave (I have verified it on the oscilloscope ) I get distorted signal

as it is shown on the screenshot.

I have tried to reduce the signal amplitude with the same distorted result.

I have reduced the filter coefficients resolution to speed up the filter and check

if this is speed performance issue but the result was even worse.

Anyone having an idea how to investigate further?

Thank you.

Dimitar

Altera_Forum · ‎07-10-2018

--- Quote Start ---

Hello,

I am trying to use 11 taps symmetrical FIR using the IP Catalog in Quartus 14.1

My data are in signed Q.15 format with sampling rate 288MHz.

I get the data in packets of 4 samples (64bits) on each 72MHz tick

and I want the filter to produce the data in packets of 4 samples quads (64bits)

I have setup the FIR II instance with the following parameters:

-single Rate,

-interpolation and decimation are 1

-one channel

-72MHz clock rate

-288MHz sample rate (so I use TDM and my input and output are 64bits)

-no back preasure

-Symmetrical filter ( 0.005798526799750 -0.000976593987326 0.010162681180614 -0.004699858564008 0.094485468273818 0.731713045004217

0.094485468273818 -0.004699858564008 0.010162681180614 -0.000976593987326 0.005798526799750)

Filter gain is 0.94125

- Coefficients Signed Fractional Binary, 15 fractional bits

- I have imported 11 floating point coefficients (it is strange that the frequency response is not shown,

the figure is scaled out of my curve and the impulse response is shown as 0 line. For now I assume it is a bug in the tool)

- Input option is signed fractional binary with 15 fractional bits and 16 bits in total.

- Output option is signed fractional binary, I truncate 15 LSB bits and 5 MSB bits so I get signed Q.15 as a result.

- The filter is clocked with 72MHz clock (the same clock I get my quads at)

- on the 'valid' input I set constant 1'b1

- On the 'error' signal I set constant 2'b00

I can pass both the input and the output of the filter to the DAC and can monitor the analog signal on the oscilloscope.

Also I can pass some data on the FPGA output pins to look at with the logic analyzer.

What I have checked so far:

1. If I pass constant data to the filter and it responds as I expect in the full Q.15 range at the input. (checked on logic analyzer)

2. I get 'valid' output as constant 1 and I get output 'error' as constant 2'b00

3. If I apply clean 10MHz full scale sine wave (I have verified it on the oscilloscope ) I get distorted signal

as it is shown on the screenshot.

I have tried to reduce the signal amplitude with the same distorted result.

I have reduced the filter coefficients resolution to speed up the filter and check

if this is speed performance issue but the result was even worse.

Anyone having an idea how to investigate further?

Thank you.

Dimitar

--- Quote End ---

your data is 16 bits then you use 64 bits single channel. How do you expect the filter to know you got 4 channels concatenated.

Altera_Forum · ‎07-10-2018

Hi kaz,

In the mega wizard for the FIR I specify:

- Clock Rate 72

- Input Sample rate (MSPS) 288

As per the documentation this enables the FIR to work in TDM mode.

In the generated interface module for the FIR I have checked that inputs and outputs become 64bit.

What I am not clear is if this TDM mode consider those 4 samples as one signal or as 4 independent signals.

Obviously I want to implement only one channel with one signal stream (getting it in a 4 samples chunks)

If what I am doing is not OK, what is the proper way this to be done?

Apart of the obvious splitting of the 4 samples, clocking the FIR at 288MHz and then combining the output data in 4 sample chunks.

Thank you.

Dimitar

Altera_Forum · ‎07-10-2018

--- Quote Start ---

Hi kaz,

In the mega wizard for the FIR I specify:

- Clock Rate 72

- Input Sample rate (MSPS) 288

As per the documentation this enables the FIR to work in TDM mode.

In the generated interface module for the FIR I have checked that inputs and outputs become 64bit.

What I am not clear is if this TDM mode consider those 4 samples as one signal or as 4 independent signals.

Obviously I want to implement only one channel with one signal stream (getting it in a 4 samples chunks)

If what I am doing is not OK, what is the proper way this to be done?

Apart of the obvious splitting of the 4 samples, clocking the FIR at 288MHz and then combining the output data in 4 sample chunks.

Thank you.

Dimitar

--- Quote End ---

either use 4 parallel input paths (1 channel per path) or use 4 channels and pass 16 bits data samples serially (s1=> s2=>s3=>s4)

Altera_Forum · ‎07-10-2018

Can you please elaborate what does this mean in term of the IP Catalog FIR II parameters?

What clock do you suggest i use for the FIR? (I have 4 x 16bit samples on each 72MHz tick)

What "Clock Rate' and "Input Sample rate (MSPS)' should I specify in the FIR wizard?

My FIR should be "Single Rate" with interpolation 1 and decimation 1, correct?

Will my input and output avalon streaming path be 16 or 64 bits wide?

I am sorry if my questions are pretty basic, the documentation is kind of confusing to me.

Altera_Forum · ‎07-10-2018

--- Quote Start ---

Can you please elaborate what does this mean in term of the IP Catalog FIR II parameters?

What clock do you suggest i use for the FIR? (I have 4 x 16bit samples on each 72MHz tick)

What "Clock Rate' and "Input Sample rate (MSPS)' should I specify in the FIR wizard?

My FIR should be "Single Rate" with interpolation 1 and decimation 1, correct?

Will my input and output avalon streaming path be 16 or 64 bits wide?

I am sorry if my questions are pretty basic, the documentation is kind of confusing to me.

--- Quote End ---

option 1:

clk = 72*4

sample rate = 72

you need to serialise your 4 streams from 72 domain to 72*4 domain serially and input to fir. The fir should be set to 4 channels and will show on the output the channel number.

option 2 (if applicable) use one channel but 4 parallel inputs or 4 instants of fir (though with no sharing of resource) and pass your data as it is on 72 rate with filter clock set to either 72 or 72*4(to save resource) and you should then get enable as 1:4

Altera_Forum · ‎07-10-2018

Hi kaz,

It is still not clear to me. Lets me try I understand your block diagram first

I have 16 bit data stream 288MHz sample rate [... s7 s6 s5 s4 s3 s2 s1 s0 ... ] lets assume s0 is old sample, s7 is more recent sample

I get my stream in 64 bit packets like this

...

[s3 s2 s1 s0]

[s7 s6 s5 s4]

...

those quads I get on each 72 Mhz tick

1. From DSP perspective what I want to achieve is to filter my 288Mhz data stream with simple single channel FIR filter

https://alteraforum.com/forum/attachment.php?attachmentid=15769&stc=1

(Please see 1.png)

to do this I need to serialize my quads

In your "option 1" you suggest clk = 4 x72=288MHz ,so I think you suggest this structure?

But why you suggest 4 channels then? I need a single 288MHz filter and not 4 independent filters.

(In your suggesting I don't know what "sample rate = 72" would mean.

I tested that this setup brings 16 bit avalon bus width.)

2. Your second suggestion seems to be something like this + output multiplexer:

(Please see 2.png)

Have I understood correctly?

Do you suggest that i have to calculate the impulse response of the above 4 filters such that if i multiplex their outputs with 288mhz rate i get the result I am after?

(Edit: In fact it should be not a MUX but a Summation block and the FIR clocks should be 25% shifted ... for now I am just trying to get the global picture )

On a DSP controller the block processing (processing the data stream chunk by chunk) is something very standard.

I am surprised that this appear as something not standard in the FPGA world, or am I misunderstood?

Thank you

Dimitar

Altera_Forum · ‎07-10-2018

--- Quote Start ---

Hi kaz,

It is still not clear to me. Lets me try I understand your block diagram first

I have 16 bit data stream 288MHz sample rate [... s7 s6 s5 s4 s3 s2 s1 s0 ... ] lets assume s0 is old sample, s7 is more recent sample

I get my stream in 64 bit packets like this

...

[s3 s2 s1 s0]

[s7 s6 s5 s4]

...

those quads I get on each 72 Mhz tick

1. From DSP perspective what I want to achieve is to filter my 288Mhz data stream with simple single channel FIR filter

https://alteraforum.com/forum/attachment.php?attachmentid=15769&stc=1

(Please see 1.png)

to do this I need to serialize my quads

In your "option 1" you suggest clk = 4 x72=288MHz ,so I think you suggest this structure?

But why you suggest 4 channels then? I need a single 288MHz filter and not 4 independent filters.

(In your suggesting I don't know what "sample rate = 72" would mean.

I tested that this setup brings 16 bit avalon bus width.)

2. Your second suggestion seems to be something like this + output multiplexer:

(Please see 2.png)

Have I understood correctly?

Do you suggest that i have to calculate the impulse response of the above 4 filters such that if i multiplex their outputs with 288mhz rate i get the result I am after?

(Edit: In fact it should be not a MUX but a Summation block and the FIR clocks should be 25% shifted ... for now I am just trying to get the global picture )

On a DSP controller the block processing (processing the data stream chunk by chunk) is something very standard.

I am surprised that this appear as something not standard in the FPGA world, or am I misunderstood?

Thank you

Dimitar

--- Quote End ---

I see your new description different . So you have one stream data (16 bits wide) @ 288Msps...yes?

Then the next line of:

[s3 s2 s1 s0]

[s7 s6 s5 s4]

means your stream is in two parallel paths. If so you need to run clock at 288*2

If that is too fast then you are aiming at two paths filtering. This is hard and you will need a lot work to get it right (Google: even/odd filter streaming).

Have I understood you?

Altera_Forum · ‎07-10-2018

Hi Kaz,

I don't understand what you mean by "path". Probably we still misunderstood.

Let me try again:

1. I have single 16 bit, 288Hz data stream (you can think it represents one analog signal acquired with 16bit ADC, clocked with 288MHz)

2. I want to filter this signal (in digital domain) using symmetrical FIR with 11 coefficients

3. My current system is implement such a way that I get four 16 bit samples from the stream at once. (This means that I get 4 samples each 72 MHz)

How would you approach this problem in Quartus ?

In my current system I have access to 72MHz clock. I don't have 288mhz but probably I can use PLL to make it.

I would prefer I solve this using FIR II core clocked at 72MHz, if possible? (even for the sake of 4 times more hardware)

But if this is the only option I will try to go with 288Mhz clock.

Is my question more clear now?

Regards

Dimitar

Altera_Forum · ‎07-10-2018

--- Quote Start ---

Hi Kaz,

I don't understand what you mean by "path". Probably we still misunderstood.

Let me try again:

1. I have single 16 bit, 288Hz data stream (it represents one analog signal acquired with 16bit ADC, clocked with 288MHz)

2. I want to filter this signal (in digital domain) using symmetrical FIR with 11 coefficients

3. My current system is implement such a way that I get four 16 bit samples from the stream at once. (This means that I get 4 samples each 72 MHz)

How would you approach this problem in Quartus ?

Regards

Dimitar

--- Quote End ---

ok your stream is 16 bits @ 288Msps but implemented as 4 parallel sections @ 72MHz

All you have to do is:

convert the 4 section to 288 speed (serialise the 4 sections and pass it to filter(one channel @ 288 clock rate and 288 sample rate)

Altera_Forum · ‎07-10-2018

Thank you kaz!

I'll try to derive 288MHz

As a separate question.

Do you think that there is simple solution of what i want with IP Catalog FIR II core in 72MHz domain?

Thanks

Dimitar

Altera_Forum · ‎07-10-2018

--- Quote Start ---

Thank you kaz!

I'll try to derive 288MHz

As a separate question.

Do you think that there is simple solution of what i want with IP Catalog FIR II core in 72MHz domain?

Thanks

Dimitar

--- Quote End ---

The short answer is no

you can use two parallel sections as even odd samples @ 144. You will need four sub-filters, not worth it and has to be done by hand not fir compiler. After all I assume your output may have to be one stream @ 288

Altera_Forum · ‎07-10-2018

I see.

Actually my output should be passed the same way I get it (4 consecutive samples at once on each 72Mhz tick).

My current system is using 72MHz clock.

The odd/even structure you have in mind is something similar as the polyphase filter decomposition from the multi rate theory I guess?

Altera_Forum · ‎07-12-2018

OK Now I have 288MHz clock to my FIR filter but I seems to get issues with it.

I have verified that filter 'valid' signal is constant 1 and filter output error[1:0] is 2'b00

Is this a confirmation that the filter manages to do its calculations on time?

What is to be expected if the clock reaches the filter maximum performance speed?

Thanks

Dimitar

Altera_Forum · ‎07-12-2018

--- Quote Start ---

OK Now I have 288MHz clock to my FIR filter but I seems to get issues with it.

I have verified that filter 'valid' signal is constant 1 and filter output error[1:0] is 2'b00

Is this a confirmation that the filter manages to do its calculations on time?

What is to be expected if the clock reaches the filter maximum performance speed?

Thanks

Dimitar

--- Quote End ---

vin should be continuously high (by your input mux). vout will then be continuously high(by filter). It is single rate so both input and output are running on 288Mhz which is your data rate.

Your question on filter maximum speed implies some uncertainty of your thoughts here. can you explain it further.

Altera_Forum · ‎07-12-2018

OK sorry I was not clear.

My filter contains 11 coefficients. It is symmetrical FIR.

I am not sure how this is internally implemented but I guess I have 6 multipliers (because of the symmetry) and some adders.

I guess multipliers are implemented by a dedicated blocks and probably they all work in parallel?

Anyways I was wondering what will happen if the period of my clock is shorter compared to the time needed for internal FIR logic to complete the calculations of the current output sample. Will the filter clear the output 'valid' flag in this case?

I have no idea if with my 288MHz and 11 taps I am close to this point or not? I am using Stratix IV.

Clarification is very welcome.

Altera_Forum · ‎07-13-2018

--- Quote Start ---

OK sorry I was not clear.

My filter contains 11 coefficients. It is symmetrical FIR.

I am not sure how this is internally implemented but I guess I have 6 multipliers (because of the symmetry) and some adders.

I guess multipliers are implemented by a dedicated blocks and probably they all work in parallel?

Anyways I was wondering what will happen if the period of my clock is shorter compared to the time needed for internal FIR logic to complete the calculations of the current output sample. Will the filter clear the output 'valid' flag in this case?

I have no idea if with my 288MHz and 11 taps I am close to this point or not? I am using Stratix IV.

Clarification is very welcome.

--- Quote End ---

FPGA -unlike software- can work from full parallism to full serial processing as chosen by designer.

It may require pipeline delay for timing closure issues i.e. output could be late but then it is fixed initial delay and stream comes out continuous thereafter.

Altera_Forum · ‎07-20-2018

If I have to ask different way.

What do you think is the maximum clock rate I can clock my 11 taps symmetric FIR from the IP core

if enough resources on Stratix IV silicon?

Altera_Forum · ‎07-21-2018

--- Quote Start ---

If I have to ask different way.

What do you think is the maximum clock rate I can clock my 11 taps symmetric FIR from the IP core

if enough resources on Stratix IV silicon?

--- Quote End ---

In fpga jargon you are asking for "fmax". That is implementation issue. For a good design you can well get 300MHz in your device