Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
21615 Discussions

ALTSHIFT_TAPS and timing issue

Altera_Forum
Honored Contributor II
2,920 Views

I am having some problems going from simulation to synthesis in my design. The simulation works fine while the synthesized and fitted design does not. 

I have written a test bench to synthesize a triangle wave on an audio converter. This works fine. 

 

I have written a decimating filter in Verilog that lowers the sample rate by a factor of 8. The decimating filter has 96 taps and I have 18 bit coefficients and 24 bit data feeding into multipliers. 

 

The coefficients are constant and the 24 bit data feeds in from an ALTSHIFT_TAPS IP block. 

 

As 96 taps sucks up a lot of multipliers, I have a 32 value multiplier. 

 

When the data loads I calculate the first 32 taps, on the next clock I calculate taps 33 to 64 and on the next tap I calculate taps 65-96 (then I shift ALTSHIFT_TAPS three times to set up the data for the next load, while shifting in the current data). 

 

I have created a monitor to detect the rising and falling edges of the ramp and this works well in simulation, on the output of the decimator, however the synthesized and fitted design shows a lot of glitches in signal tap. 

 

The timing analysis shows some horrendous timing issues between the 98.304MHz clock that is generated by a PLL, and some of the decimator registers. 

 

Is there anyway of improving this? 

 

I am using Cyclone IV GX and Quartus 14.0.  

 

Are there any design examples for DSP and ALTSHIFT_TAPS?
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
2,190 Views

if you have 96 taps for a decimate by 8 filter then you need 96/8 = 12 multipliers only. 

If your system clock is faster than data input rate by say 2 then you can opt to use 6 multipliers only.
0 Kudos
Altera_Forum
Honored Contributor II
2,190 Views

I am decimating the sampling rate by 8. Sorry.  

 

So I need to perform 96 calculations per sample but the output of the decimating filter is read only on one sample in eight.
0 Kudos
Altera_Forum
Honored Contributor II
2,190 Views

 

--- Quote Start ---  

I am decimating the sampling rate by 8. Sorry.  

 

So I need to perform 96 calculations per sample but the output of the decimating filter is read only on one sample in eight. 

--- Quote End ---  

 

 

you only need 12 multipliers. 

 

Decimation by 8 means you do not need to output but once every 8 and so you have 7 more left time slots (clock ticks) to use same 12 multipliers the add up the results (accumulate) ready for nest output.
0 Kudos
Altera_Forum
Honored Contributor II
2,190 Views

Ah yes. I sse. As long as I keep track of the values at the 8 time slots I can accumulate and subtract the relevant value from the previous set.  

 

That makes life easier. Thank you!!!!!!
0 Kudos
Altera_Forum
Honored Contributor II
2,190 Views

 

--- Quote Start ---  

Ah yes. I sse. As long as I keep track of the values at the 8 time slots I can accumulate and subtract the relevant value from the previous set.  

 

That makes life easier. Thank you!!!!!! 

--- Quote End ---  

 

 

If you want easier life then you can use 6 multipliers only (pre add then multiply by a symmetrical coeff)
0 Kudos
Altera_Forum
Honored Contributor II
2,190 Views

Yes. The (software) DSP guy has just explained that to me. The 12 muliplier decimator is working great thank you!!!! 

 

I see I can apply a similar but easier method to interpolation just by ordering the coefficients correctly.
0 Kudos
Altera_Forum
Honored Contributor II
2,190 Views

 

--- Quote Start ---  

Yes. The (software) DSP guy has just explained that to me. The 12 muliplier decimator is working great thank you!!!! 

 

I see I can apply a similar but easier method to interpolation just by ordering the coefficients correctly. 

--- Quote End ---  

 

 

pre add increases mult inputs by one bit and may lead to extra resource if mults don't support the increased bitwidth. 

 

With interpolation there are no spare clock ticks but there are alternating zeros in input (physically or assumed) and so you arrange coeffs as polyphases by skipping prototype filter regularly. Pre-add may fail here due to loss of symmetry except for some values of interpolation.
0 Kudos
Altera_Forum
Honored Contributor II
2,190 Views

The same problem reared its head in the 12 multiplier version. A silly mistake on my behalf.  

 

I took the 12 multiplier outputs and wrote 

 

assign res = prod1 + prod2 + prod3 + prod4 + prod5 + prod6 + prod7 + prod8 + prod9 + prod10 + prod11 + prod12; 

 

This create a silly combinational path.  

 

I used PARALLEL_ADD instead and Voila!
0 Kudos
Altera_Forum
Honored Contributor II
2,190 Views

I think ans. is 12. if this is wrong please tell me the true ans.:D:D

0 Kudos
Reply