Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16969 Discussions

Pipeline parameter issues for LPM_Divide

FPGA_Newbie
Novice
2,424 Views

Hey everyone, 

im new to VHDL and Quatus Prime. Im working with ADC data and i need to calculate the RMS for it using LPM_Mult, LPM_Divide and ALTSQRT, but im having problems with pipeline values only for divide. the LPM_Divide has the following parameters:

lpm_drepresentation => "UNSIGNED",
lpm_hint => "LPM_REMAINDERPOSITIVE=TRUE",
lpm_nrepresentation => "UNSIGNED",
lpm_pipeline => 24,
lpm_type => "LPM_DIVIDE",
lpm_widthd => 22,
lpm_widthn => 24

I have read in the user guide that we cannot specify a calue for the LPM_PIPLINE parameter that is higher than LPM_WIDTHN => 24 .. and im still getting a lot of negative slacks (Please refer to the photo below for details). Any ideas and help will be appreciated

Im using Quartus Prime 18.1.
FPGA Cyclone IV.


Labels (1)
0 Kudos
16 Replies
sstrell
Honored Contributor III
2,385 Views

It's not clear that pipelining has anything to do with it.  You have over 4 ns of data delay on this failing path.  What would be really useful to see is the data arrival path section of that report to see where that extensive data delay is coming from.  Also, make sure that your design is fully constrained for timing.  Seeing your .sdc file would be useful as well.

0 Kudos
FPGA_Newbie
Novice
2,358 Views

Hey sstrell,

Thank you for your reply.

I made a screenshot of my timing analyser:
- First the statistics showing the delay coming from the cells.
- Data Arrival Path showing my clock delay 0.619 ns, then the delay for the data path 4.328 ns.
- .sdc file showing the constrained inputs and outputs, plus the generated PLL 250 MHz and set_multicycle_path for controlling the setup and hold for my outputs.

Unfortunately I can't really understand the cause of this timing error, I know it's coming from a lot of cells and ICs, but only from LPM_DIVIDE_component.

I also tried to add a component between the adder and divider to slow down the incoming data for one clock but it didn't work.

If you have any thoughts on how we can manage to control the timing error, I'd appreciate it.

thanks in advance

 

0 Kudos
sstrell
Honored Contributor III
2,323 Views

There's still more of the data arrival path that you have not shown.  What you have shown does not show the significant delay that may be causing the issue.  Scroll down to show the rest.

And it would help to show your .sdc.  Multicycle can easily screw things up if you don't use it properly.

FPGA_Newbie
Novice
2,271 Views

Hey sstrell,

in the zip file you'll find the .sdc file, the data path and the setup report. Hopefully this will help us find an answer to the timing error.

0 Kudos
FvM
Honored Contributor I
2,258 Views

Hi,
I performed test compilation of lpm_divide with given parameters and found that it can't achieve higer clock speed than about 190 MHz on Cyclone IV, speed class 6 with maximal lpm_pipeline, despite of huge resource usage. For comparison, a sequential divider, e.g. divider.v contained in stx_cookbook11/arithmetic (see GitHub - thomasrussellmurphy/stx_cookbook: Altera Advanced Synthesis Cookbook 11.0) achieves 170 MHz with less than 1/5 of lpm_divide logic resources.

Means, you are expecting too much from Cyclone IV.  Need to run the respective logic with lower clock speed. Recent low cost FPGA, e.g. Cyclone 10 LP achieves similar performance.

 

What's your ADC sampling rate? If it's really that high (250 MHz), consider an alternative RMS topology that doesn't need continuous full speed division.

FPGA_Newbie
Novice
2,166 Views

Hi FvM,

 

thanks for your replay and the Infos. I'm testing my design on the Cyclone IV FPGA first for other FPGA so i think you're right about expecting too much form it.

I have to use set_mulicycle_path for the outputs otherwise i'll have setup and hold violations. Is there something else i can do there ? 
- set_multicycle_path -setup -end -from [get_registers {wr_fifo:DUT10|q_wurzel_s*}] -to [get_ports {q*}] 2
- set_multicycle_path -hold -end -from [get_registers {wr_fifo:DUT10|q_wurzel_s*}] -to [get_ports {q*}] 1

I tried yesterday what the report timing closure recommendations in Timing Analyzer and it showed me this:
- turn off auto shift register replacement in Analysis & synthesis settings.
- turn on physical synthesis for combinational logic in fitter Setting.

Surprisingly it worked and now im having f_max in worst-case up to 257.4 MHz.
Is it right to change the setting like that or what ?

thanks in advance

0 Kudos
FvM
Honored Contributor I
2,110 Views
Hi,
it's not obvious to me that multicycle applies to the divider. It presumes that input data isn't updated every clock cycle. If so, why aren't you running the divider at a lower clock rate? To answer the question we would need to know design details.
0 Kudos
FPGA_Newbie
Novice
2,009 Views

Hi FvM,

 

thanks for your relpy.

Your were right about the lpm_divide speed, i tried the PLL with 4 different outputs-Freqence (100MHz, 150MHz, 200MHz and 300MHz) and im getting these timing errors for lpm_divide only when im using the 300MHz clock, max frequency it can do 220MHz with these same parameters.

 

But now im getting removal timing errors, i didn't get this before and don't know why im getting these now

Please refer to the .zip file for its contents:

- RTL Viewer

- Timing summary

- Removal 'reset' report

- .sdc file

 

PS: i got this now. it should be contacted (assignment) to the 'reset' pin hardware.

please take a look in zip-file.

 

thanks in advance

0 Kudos
RichardTanSY_Intel
1,532 Views

I can see that there is significant clock skew and possibly a large data delay, which might be causing the issue.

Please check the data path to see if there is a high fanout or a point where there is a substantial delay.


Could you share your design by archiving the project (Project > Archive Project) so that I can investigate it further.


Regards,

Richard Tan


0 Kudos
FvM
Honored Contributor I
1,508 Views

Hi,
I have a comment beyond LPM_divide maximum speed question. I'm frequently implementing RMS measuring circuits, I wonder why you don't use a topology without divider. I have however difficulties to relate your RTL schematic to a usual RMS scheme which is either fixed integration window over n periods of signal fundamental or continuous recursive processing with low pass (equivalent of analog RMS detector circuit). In case of discontinuos processing, division is only performed once per window period or substituted by a multiply.

0 Kudos
FPGA_Newbie
Novice
1,207 Views

Hi, 

thank you Richard and FvM for your replying.


I've written a testbench for my desgine and the results are great. Thanks to FvM, i know for sure that the lmp_divide with these parameters can't work with 250MHz, so i used anouther PLL output but with 125MHz and connected to this divide IP (of course before that with a component between 250MHz and 125MHz).
Now im facing the Timing Analysis and trying to understand the I/O Constraining and where i can find this Timing in the datasheet my FPGA, maybe i need also help with that too.

Thanks a lot again.

Kind regards
 

0 Kudos
RichardTanSY_Intel
1,149 Views

I'm pleased to know that your current question has been addressed.


If you have any unique/follow up question, we recommend to file a new case for tracking purpose. 

We prefer a new case for each unique technical problem, as it aids our case analysis and helps us assess our customer support requirements. 


Now, I will transitioning this thread to community support. If you have any further questions or concerns, please don't hesitate to reach out. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support.

The community users will be able to help you on your follow-up questions.


Thank you and have a great day!


Best Regards,

Richard Tan



0 Kudos
FPGA_Newbie
Novice
857 Views

Hi,

 

I don't know if it makes sense to make a new case for a simaler problem, so im going to right it here

Im facing timing issues like before with "LPM_Divide" but im now using the average (similare to RMS). I have a sample design for adding the data with a PLL output of 250MHz and a register for dividing that PLL output to 125MHz called "clk_2".

"clk_2" is connected to a "valid_register" (for a faster clock domain (PLL output) to another slower clock domain (clk_2)) and LPM_Divide where im facing timing errors.

Here some Infos about my design:

 

RTL Viewer this Desgin and Timing report attached.


LPM_Divide Parameters:
lpm_drepresentation => "UNSIGNED",
lpm_hint => "MAXIMIZE_SPEED=6,LPM_REMAINDERPOSITIVE=TRUE",
lpm_nrepresentation => "SIGNED",
lpm_pipeline => 12,
lpm_type => "LPM_DIVIDE",
lpm_widthd => 22,
lpm_widthn => 12

SDC file:

set_time_format -unit ns -decimal_places 3

 

create_clock -name {clk} -period 20.000 -waveform { 0.000 10.000 } [get_ports {clk}]


create_generated_clock -name {DUT0|altpll_component|auto_generated|pll1|clk[0]} -source [get_pins {DUT0|altpll_component|auto_generated|pll1|inclk[0]}] -duty_cycle 50/1 -multiply_by 5 -master_clock {clk} [get_pins {DUT0|altpll_component|auto_generated|pll1|clk[0]}]
create_generated_clock -name clk_2 -source [get_pins {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -divide_by 2 [get_pins {clk_2|q}]

 

set_clock_uncertainty -rise_from [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -rise_to [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020
set_clock_uncertainty -rise_from [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -fall_to [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020
set_clock_uncertainty -fall_from [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -rise_to [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020
set_clock_uncertainty -fall_from [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -fall_to [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020

set_clock_uncertainty -rise_from [get_clocks {clk}] -rise_to [get_clocks {clk}] 0.020
set_clock_uncertainty -rise_from [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -rise_to [get_clocks {clk}] 0.020
set_clock_uncertainty -rise_from [get_clocks {clk}] -rise_to [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020

set_clock_uncertainty -rise_from clk_2 -rise_to DUT0|altpll_component|auto_generated|pll1|clk[0] 0.020
set_clock_uncertainty -fall_from clk_2 -rise_to DUT0|altpll_component|auto_generated|pll1|clk[0] 0.020
set_clock_uncertainty -rise_from DUT0|altpll_component|auto_generated|pll1|clk[0] -fall_to clk_2 0.020
set_clock_uncertainty -fall_from clk_2 -rise_to clk_2 0.020
set_clock_uncertainty -fall_from clk_2 -fall_to clk_2 0.020

 

set_input_delay -add_delay -clock [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020 [get_ports {ADC_data*}]
set_input_delay -add_delay -clock [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020 [get_ports {num_adc_data*}]

 

set_output_delay -add_delay -clock [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020 [get_ports {fifo_wrreq}]
set_output_delay -add_delay -clock [get_clocks {clk_2}] 0.020 [get_ports {mean_result*}]


set_false_path -from [get_ports {reset}]

 

set_multicycle_path -setup -end -from [get_registers {ADD_res:DUT6|fifo_valid_s}] -to [get_ports {fifo_wrreq}] 2
set_multicycle_path -hold -end -from [get_registers {ADD_res:DUT6|fifo_valid_s}] -to [get_ports {fifo_wrreq}] 1


Any thoughts on how we can manage to control the timing error, I'd appreciate it.
thanks in advance

0 Kudos
sstrell
Honored Contributor III
824 Views

It's really not clear what you are trying to do here.  You're launching on a falling edge and latching an output (to a downstream device I presume) on the following launch edge, so you only have 4 ns between launch and latch.  Is that intended?

You also have a huge clock skew.  I'm not really sure what is going on here.

0 Kudos
FPGA_Newbie
Novice
692 Views

Hey sstrell,

i didn't understand what you meant with the launching and latching edge, but i got it now, i should initialise "clk_2" with '0' not '1'.
where can i see that i have a clock skew ?

0 Kudos
sstrell
Honored Contributor III
661 Views

This info is all in the timing reports you posted.  The waveform view clearly shows that you are analyzing timing with the launch edge as the falling edge.  The setup slack report shows a clock skew of over 2.2 ns for all the failing paths listed in red. I'm also not sure what you are trying to accomplish with the multicycle timing exceptions in your SDC.

0 Kudos
Reply