- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey everyone,
im new to VHDL and Quatus Prime. Im working with ADC data and i need to calculate the RMS for it using LPM_Mult, LPM_Divide and ALTSQRT, but im having problems with pipeline values only for divide. the LPM_Divide has the following parameters:
lpm_drepresentation => "UNSIGNED",
lpm_hint => "LPM_REMAINDERPOSITIVE=TRUE",
lpm_nrepresentation => "UNSIGNED",
lpm_pipeline => 24,
lpm_type => "LPM_DIVIDE",
lpm_widthd => 22,
lpm_widthn => 24
I have read in the user guide that we cannot specify a calue for the LPM_PIPLINE parameter that is higher than LPM_WIDTHN => 24 .. and im still getting a lot of negative slacks (Please refer to the photo below for details). Any ideas and help will be appreciated
Im using Quartus Prime 18.1.
FPGA Cyclone IV.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's not clear that pipelining has anything to do with it. You have over 4 ns of data delay on this failing path. What would be really useful to see is the data arrival path section of that report to see where that extensive data delay is coming from. Also, make sure that your design is fully constrained for timing. Seeing your .sdc file would be useful as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey sstrell,
Thank you for your reply.
I made a screenshot of my timing analyser:
- First the statistics showing the delay coming from the cells.
- Data Arrival Path showing my clock delay 0.619 ns, then the delay for the data path 4.328 ns.
- .sdc file showing the constrained inputs and outputs, plus the generated PLL 250 MHz and set_multicycle_path for controlling the setup and hold for my outputs.
Unfortunately I can't really understand the cause of this timing error, I know it's coming from a lot of cells and ICs, but only from LPM_DIVIDE_component.
I also tried to add a component between the adder and divider to slow down the incoming data for one clock but it didn't work.
If you have any thoughts on how we can manage to control the timing error, I'd appreciate it.
thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There's still more of the data arrival path that you have not shown. What you have shown does not show the significant delay that may be causing the issue. Scroll down to show the rest.
And it would help to show your .sdc. Multicycle can easily screw things up if you don't use it properly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I performed test compilation of lpm_divide with given parameters and found that it can't achieve higer clock speed than about 190 MHz on Cyclone IV, speed class 6 with maximal lpm_pipeline, despite of huge resource usage. For comparison, a sequential divider, e.g. divider.v contained in stx_cookbook11/arithmetic (see GitHub - thomasrussellmurphy/stx_cookbook: Altera Advanced Synthesis Cookbook 11.0) achieves 170 MHz with less than 1/5 of lpm_divide logic resources.
Means, you are expecting too much from Cyclone IV. Need to run the respective logic with lower clock speed. Recent low cost FPGA, e.g. Cyclone 10 LP achieves similar performance.
What's your ADC sampling rate? If it's really that high (250 MHz), consider an alternative RMS topology that doesn't need continuous full speed division.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi FvM,
thanks for your replay and the Infos. I'm testing my design on the Cyclone IV FPGA first for other FPGA so i think you're right about expecting too much form it.
I have to use set_mulicycle_path for the outputs otherwise i'll have setup and hold violations. Is there something else i can do there ?
- set_multicycle_path -setup -end -from [get_registers {wr_fifo:DUT10|q_wurzel_s*}] -to [get_ports {q*}] 2
- set_multicycle_path -hold -end -from [get_registers {wr_fifo:DUT10|q_wurzel_s*}] -to [get_ports {q*}] 1
I tried yesterday what the report timing closure recommendations in Timing Analyzer and it showed me this:
- turn off auto shift register replacement in Analysis & synthesis settings.
- turn on physical synthesis for combinational logic in fitter Setting.
Surprisingly it worked and now im having f_max in worst-case up to 257.4 MHz.
Is it right to change the setting like that or what ?
thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it's not obvious to me that multicycle applies to the divider. It presumes that input data isn't updated every clock cycle. If so, why aren't you running the divider at a lower clock rate? To answer the question we would need to know design details.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi FvM,
thanks for your relpy.
Your were right about the lpm_divide speed, i tried the PLL with 4 different outputs-Freqence (100MHz, 150MHz, 200MHz and 300MHz) and im getting these timing errors for lpm_divide only when im using the 300MHz clock, max frequency it can do 220MHz with these same parameters.
But now im getting removal timing errors, i didn't get this before and don't know why im getting these now
Please refer to the .zip file for its contents:
- RTL Viewer
- Timing summary
- Removal 'reset' report
- .sdc file
PS: i got this now. it should be contacted (assignment) to the 'reset' pin hardware.
please take a look in zip-file.
thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can see that there is significant clock skew and possibly a large data delay, which might be causing the issue.
Please check the data path to see if there is a high fanout or a point where there is a substantial delay.
Could you share your design by archiving the project (Project > Archive Project) so that I can investigate it further.
Regards,
Richard Tan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a comment beyond LPM_divide maximum speed question. I'm frequently implementing RMS measuring circuits, I wonder why you don't use a topology without divider. I have however difficulties to relate your RTL schematic to a usual RMS scheme which is either fixed integration window over n periods of signal fundamental or continuous recursive processing with low pass (equivalent of analog RMS detector circuit). In case of discontinuos processing, division is only performed once per window period or substituted by a multiply.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thank you Richard and FvM for your replying.
I've written a testbench for my desgine and the results are great. Thanks to FvM, i know for sure that the lmp_divide with these parameters can't work with 250MHz, so i used anouther PLL output but with 125MHz and connected to this divide IP (of course before that with a component between 250MHz and 125MHz).
Now im facing the Timing Analysis and trying to understand the I/O Constraining and where i can find this Timing in the datasheet my FPGA, maybe i need also help with that too.
Thanks a lot again.
Kind regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm pleased to know that your current question has been addressed.
If you have any unique/follow up question, we recommend to file a new case for tracking purpose.
We prefer a new case for each unique technical problem, as it aids our case analysis and helps us assess our customer support requirements.
Now, I will transitioning this thread to community support. If you have any further questions or concerns, please don't hesitate to reach out. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support.
The community users will be able to help you on your follow-up questions.
Thank you and have a great day!
Best Regards,
Richard Tan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I don't know if it makes sense to make a new case for a simaler problem, so im going to right it here
Im facing timing issues like before with "LPM_Divide" but im now using the average (similare to RMS). I have a sample design for adding the data with a PLL output of 250MHz and a register for dividing that PLL output to 125MHz called "clk_2".
"clk_2" is connected to a "valid_register" (for a faster clock domain (PLL output) to another slower clock domain (clk_2)) and LPM_Divide where im facing timing errors.
Here some Infos about my design:
RTL Viewer this Desgin and Timing report attached.
LPM_Divide Parameters:
lpm_drepresentation => "UNSIGNED",
lpm_hint => "MAXIMIZE_SPEED=6,LPM_REMAINDERPOSITIVE=TRUE",
lpm_nrepresentation => "SIGNED",
lpm_pipeline => 12,
lpm_type => "LPM_DIVIDE",
lpm_widthd => 22,
lpm_widthn => 12
SDC file:
set_time_format -unit ns -decimal_places 3
create_clock -name {clk} -period 20.000 -waveform { 0.000 10.000 } [get_ports {clk}]
create_generated_clock -name {DUT0|altpll_component|auto_generated|pll1|clk[0]} -source [get_pins {DUT0|altpll_component|auto_generated|pll1|inclk[0]}] -duty_cycle 50/1 -multiply_by 5 -master_clock {clk} [get_pins {DUT0|altpll_component|auto_generated|pll1|clk[0]}]
create_generated_clock -name clk_2 -source [get_pins {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -divide_by 2 [get_pins {clk_2|q}]
set_clock_uncertainty -rise_from [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -rise_to [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020
set_clock_uncertainty -rise_from [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -fall_to [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020
set_clock_uncertainty -fall_from [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -rise_to [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020
set_clock_uncertainty -fall_from [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -fall_to [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020
set_clock_uncertainty -rise_from [get_clocks {clk}] -rise_to [get_clocks {clk}] 0.020
set_clock_uncertainty -rise_from [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] -rise_to [get_clocks {clk}] 0.020
set_clock_uncertainty -rise_from [get_clocks {clk}] -rise_to [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020
set_clock_uncertainty -rise_from clk_2 -rise_to DUT0|altpll_component|auto_generated|pll1|clk[0] 0.020
set_clock_uncertainty -fall_from clk_2 -rise_to DUT0|altpll_component|auto_generated|pll1|clk[0] 0.020
set_clock_uncertainty -rise_from DUT0|altpll_component|auto_generated|pll1|clk[0] -fall_to clk_2 0.020
set_clock_uncertainty -fall_from clk_2 -rise_to clk_2 0.020
set_clock_uncertainty -fall_from clk_2 -fall_to clk_2 0.020
set_input_delay -add_delay -clock [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020 [get_ports {ADC_data*}]
set_input_delay -add_delay -clock [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020 [get_ports {num_adc_data*}]
set_output_delay -add_delay -clock [get_clocks {DUT0|altpll_component|auto_generated|pll1|clk[0]}] 0.020 [get_ports {fifo_wrreq}]
set_output_delay -add_delay -clock [get_clocks {clk_2}] 0.020 [get_ports {mean_result*}]
set_false_path -from [get_ports {reset}]
set_multicycle_path -setup -end -from [get_registers {ADD_res:DUT6|fifo_valid_s}] -to [get_ports {fifo_wrreq}] 2
set_multicycle_path -hold -end -from [get_registers {ADD_res:DUT6|fifo_valid_s}] -to [get_ports {fifo_wrreq}] 1
Any thoughts on how we can manage to control the timing error, I'd appreciate it.
thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's really not clear what you are trying to do here. You're launching on a falling edge and latching an output (to a downstream device I presume) on the following launch edge, so you only have 4 ns between launch and latch. Is that intended?
You also have a huge clock skew. I'm not really sure what is going on here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey sstrell,
i didn't understand what you meant with the launching and latching edge, but i got it now, i should initialise "clk_2" with '0' not '1'.
where can i see that i have a clock skew ?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page