Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17267 Discussions

How to constrain a parallel bus from a PLL clock?

Altera_Forum
Honored Contributor II
5,875 Views

Hi. I have a couple of questions about the my design. Basically I have an FPGA which has a big data table which the values are readed and sent to an external DAC. The reading process and the DAC are refreshed at 100 MHz. The 100 Mhz is generated internally by the ALTPLL.  

 

1) Which is the correct way to constraint this design? I guess that set_multicycle_path constraint is required but I do not have clear the right way to do it. Some manuals advice that is better to generate virtual clock in order to constraint the design.  

 

2) Also if the DAC data bus has different delays in the board, which is the way to make an internal delay in order to compensate the difference? 

http://www.alteraforum.com/forum/attachment.php?attachmentid=12645&stc=1
0 Kudos
11 Replies
Altera_Forum
Honored Contributor II
3,791 Views

Based on your diagram, you don't need any multi-cycle path constraints as everything appears to be operating from the 'same' 100MHz clock. Sure, if the values coming from your data table remain static for multiple clock cycles, every time, then you could use it. However, if your data changes every cycle, you won't. 

 

Whilst the solution you've depicted will benefit from some constraints, your clocking solution is not one that can be constrained in a manner that will ensure it works every time. You need a solution that works by design. More on that shortly. 

 

Refer to Altera's very helpful "timing constraints (http://www.alterawiki.com/wiki/timing_constraints)" Wiki page. Whilst the explanations often require reading a couple of times, this is a good starting point. 

 

Look at the diagrams under 'Output constraints', particularly the clocking topology. This is a typical system for which timing constraints hold good. Both FPGA and external device clocked by the same clock. 

 

However, you're proposing to clock the DAC from a PLL coming from the FPGA. Very sensible, in my view, and consistent with an approach Altera/Terasic use on some of their reference designs for clocking external SDRAM devices operating @ 100MHz, like your design. They use a different output from the PLL that is deliberately skewed with respect to the clock (of the same frequency) that is used to clock the internal FPGA logic. This is what I mean by a solution the works by design. You will generate a skewed clock for your DAC. 

 

Back to constraints - do constrain the parallel data bus out to the DAC. This constraint can mop up any differences in the PCB's trace lengths - you'll need to consider the longest and shortest of those. Again, refer to the 'Output constrints' section of the Wiki. 

 

I hope that gets you started. 

 

Cheers, 

Alex
0 Kudos
Altera_Forum
Honored Contributor II
3,791 Views

Dear Alex,  

 

Thank you for the guidance to understand tis issue. I followed the Altera wiki and some guides. I constrained the output as is adviced but I have setup violations.  

I inserted the Timequest SDC cookbook lines: 

 

 

 

create_clock -name CLOCK_50M -period 20.000 create_generated_clock -name CLOCK_DDS_100M -multiply_by 2 -source }] }] # Specify the maximum external clock delay to the FPGA set CLKs_max 0.5 # Specify the minimum external clock delay to the FPGA set CLKs_min 0.5 # Specify the maximum external clock delay to the external device set CLKd_max 0.5 # Specify the minimum external clock delay to the external device set CLKd_min 0.5 # Specify the maximum setup time of the external device set tSU 2 # Specify the minimum setup time of the external device set tH 1.5 # Specify the maximum board delay set BD_max 0.82 # Specify the minimum board delay set BD_min 0.657 # Create the output maximum delay for the data output from the FPGA that accounts for all delays specified set_output_delay -clock CLOCK_DDS_100M-max }] # Create the output minimum delay for the data output from the FPGA that accounts for all delays specified set_output_delay -clock CLOCK_DDS_100M-min }]  

 

Related the data send to the DAC is fixed. I instantiated a ROM initialized by a .mif file. The problem is a setup violation for each DAC line (each bit) I attached the report file for the DAC[7] signal.  

 

I guess that the data can travel at 100 Mhz from the ROM output register to the output pin in less than 10 ns. For this I though in use a multicycle in order to allow 2 cycles to reach the output pin.
0 Kudos
Altera_Forum
Honored Contributor II
3,791 Views

Hello Frank, 

 

based on your reported Path the M9k is connected directly to the FPGA ports. You should check all corner cases that you match timing. Core to FPGA Pin tco timing (in timequest: port) may vary in my experiance more than 5, maybe 10 to 12ns over all corners. 

And the DAC has also setup and hold times to match. DAC900 from TI has (ts=1.5 and th= 1ns). This setup and hold times are related to the PLL output, which comes out of the FPGA and has also a delay range. 

Maybe you should consider to use FAST Output Registers, that are placed in the IO Buffer (verilog directive: (* useioff = 1 *)) and reclock the data with them. I am not shure, but maybe the Double Data IO Function (ddio) could be also used for your application. If alle the 14 Bits and the clock are generated the same way close in the IO Buffer you should get a good timematch.  

Don't forget the max_skew_command and the report_max_skew. If i remember right Timequest doesn't report skew violations per default. 

Dirk
0 Kudos
Altera_Forum
Honored Contributor II
3,791 Views

Hi Dirk,  

 

I have some news. Regarding SDC constraints it seems work. I had an error in the logic whichs outputs the data. But now I have an additional general question. As in the previous case I have the ROM which outs the data @ 100 Mhz. But now I want to perform some calculations in the data before sendind it to the DAC. I am using the ALT_MULT and ALT_DIVIDE cores. It seems that is possible to use an option with zero latency (combinational logic only). After synthesis I get setup violations. As I guess a multiplication of 14 bits by 14 bits add a lot of combinatorial logic adding long delays. My question is if I can add the a "set_multicycle_path" between the register in order to fix this long delay. I tried to do this but in SignalTap II the signals are not the expected.
0 Kudos
Altera_Forum
Honored Contributor II
3,791 Views

Hi Frank, 

specially a division in HDL takes a long time. You have to choose if you want to wait in clock cycles or in gate delay. In the altera cookbook there is a fast Radix 4 divider, that calculates 2 Bits within one clock cycle.That results in at least 16 clocks for a 64 / 32 division. 

If you use the "zero latency" Version it will use combinatorial logic only. In my experience you result in a relativ low core frequency in the 10MHz range. 

To make a division it is much faster to multiply with the inverse divisior. OK, this works only if the divisior is fix, or is changed slowly. 

You may add the "set_multicycle_path" to your design, which helps tha timequest is not complaining. But the design won't work if you are still feed the divider with data @ 100MHz. That would be no solution. 

 

To get a divider running with streamed data @ 100MHz i would suggest to create something like a pipelined divider. Or to multiply with the inverse divisior. 

The dsp blocks will easily handle 16 by 16 bits @ 100MHz.
0 Kudos
Altera_Forum
Honored Contributor II
3,791 Views

Hi Duesterberg,  

 

Thank you for the information. Yes as you said I know the effort in hardware to make these types of operations. My question is how long is the delay to perform a multiplication for instance. Below I instantiate the ALT_MULT core with zero latency option. It seems that takes more 11nS to get the result. This mean I have to add an offset of 12 ns in the PLL output which gives the clock to the DAC ? Or if I use the latency options, I can't figure out how wait until get the result.
0 Kudos
Altera_Forum
Honored Contributor II
3,791 Views

Hi Frank, 

 

if the path M9k + multiplier path is to long for your 100MHz period you have to add pipeline stage in the middle. That could be a FPGA FF or FFs from the M9k or from the DSP Block. 

 

Dirk
0 Kudos
Altera_Forum
Honored Contributor II
3,791 Views

Hi Dirk,  

 

Ok. It means calculate the multiplication in parallel to accelerate the procedure. But the ALT_MULT instantiation with zero latency is not a well implementation of a product operation? I mean if ALT_MULT uses a DSP multiplier it is not the best described hardware in terms of performance?
0 Kudos
Altera_Forum
Honored Contributor II
3,791 Views

Hi Frank, 

 

the DSP multiplier with zero latency is the fastest you can implement in terms of propagation delay (zero latency refers to clock cycles, not path delay). But your goal is to process data @ 100MHz.The clock period is 10ns. The DSP Block needs about 5ns. The M9k needs also 5ns. Then you need margin for interconnect, tsu, tco. 

This means you can't use M9k and DSP as a combinatorial path @ 100MHz. 

You should do: 

address generator -> FF -> ROM -> FF -> Multiplier -> FF -> Outputs 

 

This setup is able to process Data at 100MHz datarate, with a latency of 3 clock cycles. 

 

M9k and the DSP Blocks are able to use FFs as hardmacro. To get good IO Timing the last FFs should sit in the IO Buffer. 

 

Dirk
0 Kudos
Altera_Forum
Honored Contributor II
3,791 Views

Hi Dirk,  

 

Thank you for reply. I understand now. And related the skewed version for the DAC clock, which is the required skew? I guess that is required an offset equal to all the delay internal in the FPGA plus board delays? I guess that 3 latency cycles is not a problem, but it is necessary to get the relationship between the DAC clk and the data outputs from the FPGA.
0 Kudos
Altera_Forum
Honored Contributor II
3,791 Views

Hi,  

 

My main question is about how to calculate the right phase needed in the attached diagram. I guess the phase required is equal to the path delay involved in the ROM logic. Also I am wondering if the source - synchronous compensation mode in the ALTPLL can help to avoid these offset.
0 Kudos
Reply