Community
cancel
Showing results for 
Search instead for 
Did you mean: 
AEsqu
Novice
466 Views

huge delay inserted by Quartus 19.3 pro on the Arria 10 FPGA

Hello,

 

I see a huge delay inserted by Quartus 19.3 pro on the Arria 10 FPGA.

This was not seen in the stratix III FPGA using quartus 13.1.

 

I will attach the picture showing this.

 

This lead to huge hold time violations.

 

Something named ~la_lab/laboutb by Quartus.

 

I saw another Topic where another person had similar issue with the Arria 10.

 

How can this be solved?

 

0 Kudos
28 Replies
152 Views

Hi,

 

Can you provide the design.qar for investigation?

 

Thanks.

 

AEsqu
Novice
152 Views

AEsqu
Novice
152 Views

For example, the following path (gated clock block output to FF):

rfd_ic_i|u_top|u_core|u_rfd_clockshop|i_mcu_flexcomm1_clockgate|i01_cnhlspd|Q -> rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array_1__A_flexcomm|genblk1_A_flexcomm|A_flexcomm_ctrl_gen_A_flexcomm_ctrl|A_flexcomm_fifo|A_flexcomm_fifo_ptrs_rx|rptr_gray_0_

takes 0.68 ns in quartus 13.1 stratix 3

and takes 9.1 ns !!! in quartus 19.3 Arria 10.

 

Why is quartus adding so much delay in that path for the arria 10?

 

Attaching 4 pictures showing this.

 

report command was:

report_timing -from_clock { flexcomm1_hclk } -to_clock { flexcomm1_hclk } -from {rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array_1__A_flexcomm|genblk1_A_flexcomm|A_flexcomm_ctrl_gen_A_flexcomm_ctrl|A_flexcomm_fifo|A_flexcomm_fifo_ptrs_rx|rptr_0_} -to {rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array_1__A_flexcomm|genblk1_A_flexcomm|A_flexcomm_ctrl_gen_A_flexcomm_ctrl|A_flexcomm_fifo|A_flexcomm_fifo_ptrs_rx|rptr_gray_0_} -hold -npaths 100 -detail full_path -panel_name {Report Timing}

 

 

AEsqu
Novice
152 Views

posted a file.
sstrell
Honored Contributor II
152 Views

There are a few things going on here. Extra delay is good for hold (in this case, removal) analysis. Remember that for hold/removal analysis, you want the signal to remain active longer to meet the timing requirement after the latch edge. So the issue here is the delay of the clock to the destination register (the data required path), not the control signal itself (data arrival path). The clock skew of 11 ns shown at the top of the screenshot is a quick giveaway to the problem.

 

It looks like the clock is being routed through device logic instead of a global clock routing channel because you have a gated clock. If you must gate the clock, it's usually best to put the gating logic on the clock enable signal of the destination register instead of in the clock path. That would probably fix this issue. You could also try forcing the clock onto a global routing channel using the Global Signal assignment in the Assignment Editor, but the gating logic would still require the clock to come off of the global routing channel, adding potentially additional delay.

 

There's no way of knowing why this routed OK on the older device vs. the Arria 10. Did the design change at all? Were there other assignments involved?

 

#iwork4intel

AEsqu
Novice
152 Views

Hi sstrell,

 

I tested with global clock usage and that solves the mess for that clock.

But then on the next clock gating that follows that clock there are again extra 3 ns extra delay.

 

For some reason the Quartus 13.1 and/or stratix III was handling the clock gating much better than with quartus 19.3 and/or the Arria 10.

 

Our design has definition for about 200 clocks and have thousands of clock gating (low power).

 

Attaching a picture of the next long routing for the next clock gate after the global clock point.

 

AEsqu
Novice
152 Views

#idonotwork4intel

AEsqu
Novice
152 Views

I have been looking further into this,

apparently Quartus 19.3, for the Arria X FPGA,

has issue with clear/preset/clk constructions, that gives a combi loop (but not the case with quartus 13.1 and the stratix 3):

 

Example below:

 

  if (!cd) q <= `unitdelay 1'b0;

  else if (!sd) q <= `unitdelay 1'b1;

  else q <= `unitdelay d;

  end

Combi loop in the timequest analyzer:

 

Found combinational loop of 3 nodes

  Node "rfd_ic_i|u_top|u_core|u_flash_subsys|A_ip_pflash640k_atfc|u_controller|u_fmc_if|read_fail_sync_reg|q~1~la_mlab/laboutt[6]"

  Node "rfd_ic_i|u_top|u_core|u_flash_subsys|A_ip_pflash640k_atfc|u_controller|u_fmc_if|read_fail_sync_reg|q~1|dataf"

  Node "rfd_ic_i|u_top|u_core|u_flash_subsys|A_ip_pflash640k_atfc|u_controller|u_fmc_if|read_fail_sync_reg|q~1|combout"

 

Note the presence of the la_mlab/laboutt[6] again.

 

How to solve this issue, keeping the same RTL code?

 

Second (vhdl) example:

 

   process(scl_clk_n, rstn, start_stage1,scantestmode)

   begin

       if(rstn = '0') then

           start_stage2 <= '0' after delay_f;

       elsif(start_stage1 = '1' and scantestmode = '0') then

           start_stage2 <= '1' after delay_f;

       elsif(scl_clk_n'event and scl_clk_n = '1') then

           start_stage2 <= '0' after delay_f;

       end if;

   end process;

 

 

Found combinational loop of 3 nodes

  Node "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|slave_detect_inst|start_stage2~1~la_mlab/laboutt[0]"

  Node "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|slave_detect_inst|start_stage2~1|dataf"

  Node "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|slave_detect_inst|start_stage2~1|combout"

 

 

 

AEsqu
Novice
152 Views

Hi have seen in the doc that the stratix 3 does not support clear/preset implementation:

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/qts/qts_qii51006.pdf

 "

Register Control Signals Avoid using an asynchronous load signal if the design target device architecture does not include registers with dedicated circuitry for asynchronous loads. Also, avoid using both asynchronous clear and preset if the architecture provides only one of these control signals. Stratix III devices, for example, directly support an asynchronous clear function, but not a preset or load function. When the target device does not directly support the signals, the synthesis or placement and routing software must use combinational logic to implement the same functionality. In addition, if you use signals in a priority other than the inherent priority in the device architecture, combinational logic may be required to implement the necessary control signals. Combinational logic is less efficient and can cause glitches and other problems; it is best to avoid these implementations.

 "

 

So I have been looking further into it:

Synplify implements the clear/preset flip flop into a latch + a FF, preventing the timing analysis to be done and preventing combinational loop at quartus level timing check.

This makes those huge non sense delays to be absent.

 

Quartus synthesis implements as a normal FF with combi logic, this lead to non sense timing routing and analysis.

 

Would it be possible to tell quartus to implement a latch to solve this issue?

We won't change the RTL code, we use the code for the chip and never write specific FPGA code.

 

See an attachments with pictures showing this.

 

 

AEsqu
Novice
152 Views

Nor the Statix 3 nor arria 10 handbooks show's aset in the ALM:

 

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stx3/stratix3_handbook.p...

 

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/arria-10/a10_handbook.pd...

 

but the quartus rtl viewer shows it in the RTL viewer (so it must be a combination of the FF and logic around from the ALM).

 

This is not the case for the arria 10 (simple FF).

I'm attaching an RTL view in quartus 13.1 with the stratix 3 vqm from synplify pro P-2019.09-SP1 (async inputs are indicated).

 

The flops using the vqm from synplify pro is present in the RTL viewer of Quartus 19.3 for the arria 10 and without async inputs.

 

 

AEsqu
Novice
152 Views

And the view for Arria10 , with a VQM from synplify pro.

 

AEsqu
Novice
152 Views

Could I use (to prevent quartus doing timing analysis on the combinational loop) MAX_SCC_SIZE?

AEsqu
Novice
152 Views

I've been looking in the synthesis log,

and for both the arria10 and stratix 10 quartus synthesis have the intention to turn the Preset/clear register into a latch:

 

Warning(13310): Register "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|timeout_detect_inst|stop_stage2" is converted into an equivalent circuit using register "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|timeout_detect_inst|stop_stage2~synth" and latch "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|timeout_detect_inst|stop_stage2~synth" 

Warning(13310): Register "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|slave_inst|pending_i_capture" is converted into an equivalent circuit using register "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|slave_inst|pending_i_capture~synth" and latch "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|slave_inst|pending_i_capture~synth" 

Warning(13004): Presettable and clearable registers converted to equivalent circuits with latches. Registers power-up to an undefined state, and DEVCLRn places the registers in an undefined state. 

Warning(13310): Register "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|timeout_detect_inst|stop_stage2" is converted into an equivalent circuit using register "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|timeout_detect_inst|stop_stage2~synth" and latch "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|timeout_detect_inst|stop_stage2~synth" 

Warning(13310): Register "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|slave_inst|pending_i_capture" is converted into an equivalent circuit using register "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|slave_inst|pending_i_capture~synth" and latch "rfd_ic_i|u_top|u_core|u_atlas|A_flexcomm_array[1].A_flexcomm|A_flexcomm|A_bi2c_gen.A_bi2c_core|slave_inst|pending_i_capture~synth" 

 

 

So I guess the difference between the Stratix 3 and the Arria 10 is that timing analyzer has changed between quartus 13.1 and quartus 19.3 on latches, is it?

There is no report of combinational loop in the Quartus 13.1 when latches are asserted to solve the clear/preset reg by synplify pro (and of course Quartus does not see preset/clear as it has already been converted by Synplify pro).

AEsqu
Novice
152 Views

I can see this under the timing analyzer/check Timing/latches:

 

Analyzed unsupported latch type as a combinational loop

 

 

AEsqu
Novice
152 Views

I have added the synthesis syn_keep =1 attributes in the RTL code and the flops are well preserved and there is no combi loop anymore within quartus 19.3/arria 10 ;-)

 

AEsqu
Novice
152 Views

This synthesis keep work for few nodes but not all.

What I just saw in the timing analyzer of quartus 13.1, it does see apparently those combi loops, but it does not details them and it says this:

 

Low junction temperature is 0 degrees C

High junction temperature is 85 degrees C

TimeQuest Timing Analyzer is analyzing 110 combinational loops as latches.

 

So in Quartus 13.1 for the stratix 3, quartus cuts the paths

while for Quartus 19.3 and the arria 10, it does not cut the paths and tries to honor the non sense timings,

which leads to super long routing and -20 ns hold tme violations.

 

Would it be possible to tell Quartus 19.3 to tell him to analyze combi loop as latches like in quartus 13.1?

Maybe with an option in the QSF or so?

 

BTW, is there a ticketing system at Intel like it was the case during Altera time?

Forums is not giving proper support, we pay for the quartus license,

we expect better support than this!

 

152 Views

Hi,

 

Upon checking, this is the new feature added. The Intel Quartus Prime Pro edition software v19.3 automatically analyzes the correct amount of time borrowing based on arrival time for latches. This is why you see timing analysis on the latches.

 

Thanks.

AEsqu
Novice
152 Views

Hi,

Is it possible to disable that?

This makes routing incorrect.

Thanks.

152 Views

Hi,

 

Please allow me some time to consult engineering team.

 

Thanks.

31 Views

Hi,

 

Kindly provide the information below requested by the engineering team:

 

There are various INIs that disable various aspects of latch support / borrowing in 19.3. Before recommending anything specific, could you explain exactly why the customer wants to disable latch analysis? What's the design, what is the customer trying to achieve, etc? I want to make sure that our latch support will be in a state that will work for all customers.

 

Thanks

Reply