Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
Announcements
Need Forum Guidance? Click here

Search our FPGA Knowledge Articles here.
19073 Discussions

Clock synthesis and de-skewing using an IOPLL in Arria 10

roeekalinsky
New Contributor II
871 Views

I'm trying unsuccessfully to use an IOPLL to synthesize a clock and have it be in-phase with the reference clock, where both the reference clock and the synthesized clock are routed on GCLKs. By "in-phase" I mean ideally zero or near-zero skew, i.e. where the rising edges of the synthesized clock and the reference clock line up with each other. And to clarify/simplify, both the reference clock and the synthesized clock are just used to clock internal fabric resources, there is no external I/O involved.

With all attempts thus far, I'm seeing very large skew between the synthesized clock and the reference clock, as much as ~5 ns. So I can only assume that something is fundamentally wrong with how I'm configuring the IOPLL.

If necessary I can provide a simple design example, timing reports, etc. But before diving into that, possibly unnecessarily, let's start with some basic questions. And just for background reference, I'm well familiar with PLLs and de-skewing techniques in general, and have done this routinely in Xilinx devices. But I'm not as familiar yet with the Arria 10 PLL resources, and am finding it somewhat difficult to find good information. The Intel/Altera documentation I've found describing the IOPLL has been fairly scant, and the IP generator and simulation library models obscure critical details on low-level configuration options, internal functionality, feedback paths, etc... so at this point I must humbly request some guidance from knowledgeable Intel/Altera insiders, please.

So, here we go:

I'm using the IP generator wizard to configuring the IOPLL to "normal" compensation mode (and all other options default). "Normal" mode as tersely described in Altera documentation "compensates for the delay of the internal clock network used by the clock output". Of the listed compensation modes, that description, while not entirely clear, sounded like the appropriate choice for what I'm trying to accomplish. Is it? It claims to compensate, and yet it doesn't expose the lock feedback path to the user, so any means by which it is trying to compensate is hidden from me. So firstly, was that even a correct interpretation of the description of "normal" mode? Is "normal" mode meant to produce clocks that on GCLKs will be in phase with the reference clock that is also on a GCLK? And if not, please steer me in the right direction, and we'll go from there.

Thanks,
-Roee

0 Kudos
17 Replies
sstrell
Honored Contributor III
851 Views

I think you may be using the wrong mode.  See the user guide:

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/archives/ug_altera_iopll...

Maybe you want to be using zero-delay buffer mode?  It's not entirely clear what your goal is so I'm kind of guessing.

roeekalinsky
New Contributor II
839 Views

Thanks, @sstrell, but zero delay buffer mode doesn't seem to be what I need.  That's for putting out a clock to the board via a chip level I/O pin, and de-skews the clock for that external output, not for an internal GCLK.

From that doc (UG-01155), which I have been scouring:
"If you select the zero delay buffer mode, the PLL must feed an external clock output pin and compensate for the delay introduced by that pin. The signal observed on the pin is synchronized to the input clock. The PLL clock output connects to the altbidir port and drives zdbfbclk as an output port. If the PLL also drives the internal clock network, a corresponding phase shift of that network occurs."

My situation is purely on-chip, no external I/O involved.  Assume I have a given clock signal "clk1" that's already on a GCLK, and I need to produce another clock signal "clk2" that is also on a GCLK, is at an integer multiple of the frequency of "clk1", and is phase-aligned with "clk1".  That's what I'm trying to accomplish.

It seems like in principle what I need is more analogous to the IOPLL's "external mode", which exposes the feedback path to the user.  Except that instead of running the feedback path off-chip through I/O pins, I need to run the feedback path on-chip through just a clock control block and GCLK.  But "external mode" doesn't allow that either, it already has the input pin and output pin I/O buffers built in and has to go off-chip.  So...?

Ash_R_Intel
Employee
830 Views

Hi,

In the Normal mode the FBCLK_IN pin of the IOPLL is fed by a CLKCTRL block whose input is driven by the FBOUT output pin of the IOPLL. The CLKCTRL block is added automatically by the tool. Hence, the FBCLK_IN is not exposed to user by the IP. You can check this in the Technology map viewer after running fitter. IOPLL User Guide mentions the following, and the tool seems to implement the same.

  • If you select the normal mode, the PLL compensates for the delay of the internal clock network used by the clock output. If the PLL is also used to drive an external clock output pin, a corresponding phase shift of the signal on the output pin occurs.


Now I want to ask a question related to the measurement technique used to verify whether there is a delay between the input clock and the generated output clock or not. How are you measuring the delay between the clocks.


Regards


roeekalinsky
New Contributor II
827 Views

Hi @Ash_R_Intel, thank you for your response.

Your description of normal mode matches what I thought it should do, and yes, I can confirm via the technology viewer that it is in fact implementing the feedback path exactly as you described.  So it should be able to de-skew as intended, but it doesn't seem to.  I say this based on the clock skew shown in the static timing analysis report.

Pasted below is a snippet from the .sta.rpt from a trivial design example showing a reg-to-reg timing path going from the "clk2" domain (the output clock from the IOPLL) to the "clk1" domain (the input clock to the IOPLL).  As you can see in the report, the clock path is mapped as expected, with "clk1" on CLKCTRL_2I_G_I7, which then goes to the IOPLL, then to "clk2" on CLKCTRL_3C_G_I21 (and the IOPLL's feedback path is not explicitly shown in this report but is confirmed via technology view to be exactly as you described).

Now, as you can see in the report, there is a massive hold violation (-4.661ns) resulting from a massive skew between these two clocks (5.084ns).  And we can see in the report there is a compensation delay being applied in the IOPLL (-9.485ns, shown as type "COMP"), but it isn't obvious to me how it's coming up with that compensation amount, as this is not having a de-skewing effect.  Rather, it actually seems to be far too large of an "anti-delay".

Path #1: Hold slack is -4.661 (VIOLATED)
===============================================================================
+---------------------------------------------------------+
; Path Summary                                            ;
+---------------------------------+-----------------------+
; Property                        ; Value                 ;
+---------------------------------+-----------------------+
; From Node                       ; ff3                   ;
; To Node                         ; ff4                   ;
; Launch Clock                    ; clk1                  ;
; Latch Clock                     ; clk1                  ;
; Data Arrival Time               ; -0.543                ;
; Data Required Time              ; 4.118                 ;
; Slack                           ; -4.661 (VIOLATED)     ;
; Worst-Case Operating Conditions ; Slow 900mV -40C Model ;
+---------------------------------+-----------------------+

+-------------------------------------------------------------------------------------+
; Statistics                                                                          ;
+------------------------+-------+-------+-------------+------------+--------+--------+
; Property               ; Value ; Count ; Total Delay ; % of Total ; Min    ; Max    ;
+------------------------+-------+-------+-------------+------------+--------+--------+
; Hold Relationship      ; 0.000 ;       ;             ;            ;        ;        ;
; Clock Skew             ; 5.084 ;       ;             ;            ;        ;        ;
; Data Delay             ; 0.787 ;       ;             ;            ;        ;        ;
; Number of Logic Levels ;       ; 0     ;             ;            ;        ;        ;
; Physical Delays        ;       ;       ;             ;            ;        ;        ;
;  Arrival Path          ;       ;       ;             ;            ;        ;        ;
;   Clock                ;       ;       ;             ;            ;        ;        ;
;    IC                  ;       ; 5     ; 4.989       ; 61         ; 0.000  ; 2.573  ;
;    Cell                ;       ; 9     ; 3.166       ; 39         ; 0.000  ; 0.804  ;
;    PLL Compensation    ;       ; 1     ; -9.485      ; 0          ; -9.485 ; -9.485 ;
;   Data                 ;       ;       ;             ;            ;        ;        ;
;    IC                  ;       ; 1     ; 0.529       ; 67         ; 0.529  ; 0.529  ;
;    Cell                ;       ; 2     ; 0.086       ; 11         ; 0.000  ; 0.086  ;
;    uTco                ;       ; 1     ; 0.172       ; 22         ; 0.172  ; 0.172  ;
;  Required Path         ;       ;       ;             ;            ;        ;        ;
;   Clock                ;       ;       ;             ;            ;        ;        ;
;    IC                  ;       ; 3     ; 2.587       ; 66         ; 0.000  ; 2.587  ;
;    Cell                ;       ; 4     ; 1.321       ; 34         ; 0.000  ; 0.632  ;
+------------------------+-------+-------+-------------+------------+--------+--------+
Note: Negative delays are omitted from totals when calculating percentages

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
; Data Arrival Path                                                                                                                                                                 ;
+----------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+
; Total    ; Incr     ; RF ; Type   ; Fanout ; Location            ; HS/LP      ; Element                                                                                           ;
+----------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+
; 0.000    ; 0.000    ;    ;        ;        ;                     ;            ; launch edge time                                                                                  ;
; 0.000    ; 0.000    ;    ; borrow ;        ;                     ;            ; time borrowed                                                                                     ;
; -1.330   ; -1.330   ;    ;        ;        ;                     ;            ; clock path                                                                                        ;
;   0.000  ;   0.000  ;    ;        ;        ;                     ;            ; source latency                                                                                    ;
;   0.000  ;   0.000  ;    ;        ; 1      ; PIN_AR36            ;            ; clk1_p                                                                                            ;
;   0.000  ;   0.000  ; RR ; IC     ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input|i                                                                                    ;
;   0.632  ;   0.632  ; RR ; CELL   ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input|o                                                                                    ;
;   0.762  ;   0.130  ; RR ; CELL   ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input~io_48_lvds_tile/ioclkin[2]                                                           ;
;   0.762  ;   0.000  ; RR ; IC     ; 2      ; CLKCTRL_2I_G_I7     ;            ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|inclk  ;
;   1.211  ;   0.449  ; RR ; CELL   ; 5      ; CLKCTRL_2I_G_I7     ;            ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|outclk ;
;   3.784  ;   2.573  ; RR ; IC     ; 1      ; IOPLL_3C            ; High Speed ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst|refclk[0]                            ;
;   4.526  ;   0.742  ; RR ; CELL   ; 1      ; IOPLL_3C            ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~vco_refclk                           ;
;   4.526  ;   0.000  ; RR ; CELL   ; 1      ; IOPLL_3C            ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~vctrl                                ;
;   -4.959 ;   -9.485 ; RR ; COMP   ; 2      ; IOPLL_3C            ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~vcoph[0]                             ;
;   -4.155 ;   0.804  ; RR ; CELL   ; 1      ; IOPLL_3C            ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst|outclk[0]                            ;
;   -4.155 ;   0.000  ; RR ; CELL   ; 1      ; IOPLL_3C            ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~io_48_lvds_tile/pllcout[4]           ;
;   -4.155 ;   0.000  ; RR ; IC     ; 2      ; CLKCTRL_3C_G_I21    ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|outclk[0]~CLKENA0|inclk                         ;
;   -3.746 ;   0.409  ; RR ; CELL   ; 1      ; CLKCTRL_3C_G_I21    ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|outclk[0]~CLKENA0|outclk                        ;
;   -1.330 ;   2.416  ; RR ; IC     ; 1      ; FF_X77_Y121_N55     ; High Speed ; ff3|clk                                                                                           ;
;   -1.330 ;   0.000  ; RR ; CELL   ; 1      ; FF_X77_Y121_N55     ; High Speed ; ff3                                                                                               ;
; -0.543   ; 0.787    ;    ;        ;        ;                     ;            ; data path                                                                                         ;
;   -1.158 ;   0.172  ; FF ; uTco   ; 1      ; FF_X77_Y121_N55     ;            ; ff3|q                                                                                             ;
;   -1.072 ;   0.086  ; FF ; CELL   ; 1      ; FF_X77_Y121_N55     ; High Speed ; ff3~la_lab/laboutb[16]                                                                            ;
;   -0.543 ;   0.529  ; FF ; IC     ; 1      ; FF_X77_Y121_N53     ; High Speed ; ff4|asdata                                                                                        ;
;   -0.543 ;   0.000  ; FF ; CELL   ; 1      ; FF_X77_Y121_N53     ; High Speed ; ff4                                                                                               ;
+----------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
; Data Required Path                                                                                                                                                               ;
+---------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+
; Total   ; Incr     ; RF ; Type   ; Fanout ; Location            ; HS/LP      ; Element                                                                                           ;
+---------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+
; 0.000   ; 0.000    ;    ;        ;        ;                     ;            ; latch edge time                                                                                   ;
; 0.000   ; 0.000    ;    ; borrow ;        ;                     ;            ; time borrowed                                                                                     ;
; 3.754   ; 3.754    ;    ;        ;        ;                     ;            ; clock path                                                                                        ;
;   0.000 ;   0.000  ;    ;        ;        ;                     ;            ; source latency                                                                                    ;
;   0.000 ;   0.000  ;    ;        ; 1      ; PIN_AR36            ;            ; clk1_p                                                                                            ;
;   0.000 ;   0.000  ; RR ; IC     ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input|i                                                                                    ;
;   0.632 ;   0.632  ; RR ; CELL   ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input|o                                                                                    ;
;   0.791 ;   0.159  ; RR ; CELL   ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input~io_48_lvds_tile/ioclkin[2]                                                           ;
;   0.791 ;   0.000  ; RR ; IC     ; 2      ; CLKCTRL_2I_G_I7     ;            ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|inclk  ;
;   1.321 ;   0.530  ; RR ; CELL   ; 5      ; CLKCTRL_2I_G_I7     ;            ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|outclk ;
;   3.908 ;   2.587  ; RR ; IC     ; 1      ; FF_X77_Y121_N53     ; High Speed ; ff4|clk                                                                                           ;
;   3.908 ;   0.000  ; RR ; CELL   ; 1      ; FF_X77_Y121_N53     ; High Speed ; ff4                                                                                               ;
;   3.754 ;   -0.154 ;    ;        ;        ;                     ;            ; clock pessimism removed                                                                           ;
; 3.754   ; 0.000    ;    ;        ;        ;                     ;            ; clock uncertainty                                                                                 ;
; 4.118   ; 0.364    ;    ; uTh    ; 1      ; FF_X77_Y121_N53     ;            ; ff4                                                                                               ;
+---------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+

----------------------------
; Extra Fitter Information ;
----------------------------
HTML report is unavailable in plain text report export.

roeekalinsky
New Contributor II
824 Views

Minor correction to my previous post.  I just discovered that in the SDC I need to explicitly specify "derive_pll_clocks".  With that in place, the same skew problem is still present, but the .sta.rpt more clearly refers to clock from the IOPLL output as a separate clock.

Path #1: Hold slack is -4.670 (VIOLATED)
===============================================================================
+-----------------------------------------------------------------+
; Path Summary                                                    ;
+---------------------------------+-------------------------------+
; Property                        ; Value                         ;
+---------------------------------+-------------------------------+
; From Node                       ; ff3                           ;
; To Node                         ; ff4                           ;
; Launch Clock                    ; iopll_ip_01_i|iopll_0|outclk0 ;
; Latch Clock                     ; clk1                          ;
; Data Arrival Time               ; -0.242                        ;
; Data Required Time              ; 4.428                         ;
; Slack                           ; -4.670 (VIOLATED)             ;
; Worst-Case Operating Conditions ; Slow 900mV -40C Model         ;
+---------------------------------+-------------------------------+

+-------------------------------------------------------------------------------------+
; Statistics                                                                          ;
+------------------------+-------+-------+-------------+------------+--------+--------+
; Property               ; Value ; Count ; Total Delay ; % of Total ; Min    ; Max    ;
+------------------------+-------+-------+-------------+------------+--------+--------+
; Hold Relationship      ; 0.000 ;       ;             ;            ;        ;        ;
; Clock Skew             ; 5.084 ;       ;             ;            ;        ;        ;
; Data Delay             ; 1.088 ;       ;             ;            ;        ;        ;
; Number of Logic Levels ;       ; 0     ;             ;            ;        ;        ;
; Physical Delays        ;       ;       ;             ;            ;        ;        ;
;  Arrival Path          ;       ;       ;             ;            ;        ;        ;
;   Clock                ;       ;       ;             ;            ;        ;        ;
;    IC                  ;       ; 5     ; 4.989       ; 61         ; 0.000  ; 2.573  ;
;    Cell                ;       ; 9     ; 3.166       ; 39         ; 0.000  ; 0.804  ;
;    PLL Compensation    ;       ; 1     ; -9.485      ; 0          ; -9.485 ; -9.485 ;
;   Data                 ;       ;       ;             ;            ;        ;        ;
;    IC                  ;       ; 1     ; 0.830       ; 76         ; 0.830  ; 0.830  ;
;    Cell                ;       ; 2     ; 0.086       ; 8          ; 0.000  ; 0.086  ;
;    uTco                ;       ; 1     ; 0.172       ; 16         ; 0.172  ; 0.172  ;
;  Required Path         ;       ;       ;             ;            ;        ;        ;
;   Clock                ;       ;       ;             ;            ;        ;        ;
;    IC                  ;       ; 3     ; 2.587       ; 66         ; 0.000  ; 2.587  ;
;    Cell                ;       ; 4     ; 1.321       ; 34         ; 0.000  ; 0.632  ;
+------------------------+-------+-------+-------------+------------+--------+--------+
Note: Negative delays are omitted from totals when calculating percentages

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
; Data Arrival Path                                                                                                                                                                 ;
+----------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+
; Total    ; Incr     ; RF ; Type   ; Fanout ; Location            ; HS/LP      ; Element                                                                                           ;
+----------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+
; 0.000    ; 0.000    ;    ;        ;        ;                     ;            ; launch edge time                                                                                  ;
; 0.000    ; 0.000    ;    ; borrow ;        ;                     ;            ; time borrowed                                                                                     ;
; -1.330   ; -1.330   ;    ;        ;        ;                     ;            ; clock path                                                                                        ;
;   0.000  ;   0.000  ;    ;        ;        ;                     ;            ; source latency                                                                                    ;
;   0.000  ;   0.000  ;    ;        ; 1      ; PIN_AR36            ;            ; clk1_p                                                                                            ;
;   0.000  ;   0.000  ; RR ; IC     ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input|i                                                                                    ;
;   0.632  ;   0.632  ; RR ; CELL   ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input|o                                                                                    ;
;   0.762  ;   0.130  ; RR ; CELL   ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input~io_48_lvds_tile/ioclkin[2]                                                           ;
;   0.762  ;   0.000  ; RR ; IC     ; 2      ; CLKCTRL_2I_G_I7     ;            ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|inclk  ;
;   1.211  ;   0.449  ; RR ; CELL   ; 5      ; CLKCTRL_2I_G_I7     ;            ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|outclk ;
;   3.784  ;   2.573  ; RR ; IC     ; 1      ; IOPLL_3C            ; High Speed ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst|refclk[0]                            ;
;   4.526  ;   0.742  ; RR ; CELL   ; 1      ; IOPLL_3C            ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~vco_refclk                           ;
;   4.526  ;   0.000  ; RR ; CELL   ; 1      ; IOPLL_3C            ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~vctrl                                ;
;   -4.959 ;   -9.485 ; RR ; COMP   ; 2      ; IOPLL_3C            ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~vcoph[0]                             ;
;   -4.155 ;   0.804  ; RR ; CELL   ; 1      ; IOPLL_3C            ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst|outclk[0]                            ;
;   -4.155 ;   0.000  ; RR ; CELL   ; 1      ; IOPLL_3C            ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|iopll_inst~io_48_lvds_tile/pllcout[4]           ;
;   -4.155 ;   0.000  ; RR ; IC     ; 2      ; CLKCTRL_3C_G_I21    ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|outclk[0]~CLKENA0|inclk                         ;
;   -3.746 ;   0.409  ; RR ; CELL   ; 1      ; CLKCTRL_3C_G_I21    ;            ; iopll_ip_01_i|iopll_0|altera_iopll_i|twentynm_pll|outclk[0]~CLKENA0|outclk                        ;
;   -1.330 ;   2.416  ; RR ; IC     ; 1      ; FF_X77_Y121_N55     ; High Speed ; ff3|clk                                                                                           ;
;   -1.330 ;   0.000  ; RR ; CELL   ; 1      ; FF_X77_Y121_N55     ; High Speed ; ff3                                                                                               ;
; -0.242   ; 1.088    ;    ;        ;        ;                     ;            ; data path                                                                                         ;
;   -1.158 ;   0.172  ; FF ; uTco   ; 1      ; FF_X77_Y121_N55     ;            ; ff3|q                                                                                             ;
;   -1.072 ;   0.086  ; FF ; CELL   ; 1      ; FF_X77_Y121_N55     ; High Speed ; ff3~la_lab/laboutb[16]                                                                            ;
;   -0.242 ;   0.830  ; FF ; IC     ; 1      ; FF_X77_Y121_N53     ; Mixed      ; ff4|asdata                                                                                        ;
;   -0.242 ;   0.000  ; FF ; CELL   ; 1      ; FF_X77_Y121_N53     ; High Speed ; ff4                                                                                               ;
+----------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
; Data Required Path                                                                                                                                                               ;
+---------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+
; Total   ; Incr     ; RF ; Type   ; Fanout ; Location            ; HS/LP      ; Element                                                                                           ;
+---------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+
; 0.000   ; 0.000    ;    ;        ;        ;                     ;            ; latch edge time                                                                                   ;
; 0.000   ; 0.000    ;    ; borrow ;        ;                     ;            ; time borrowed                                                                                     ;
; 3.754   ; 3.754    ;    ;        ;        ;                     ;            ; clock path                                                                                        ;
;   0.000 ;   0.000  ;    ;        ;        ;                     ;            ; source latency                                                                                    ;
;   0.000 ;   0.000  ;    ;        ; 1      ; PIN_AR36            ;            ; clk1_p                                                                                            ;
;   0.000 ;   0.000  ; RR ; IC     ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input|i                                                                                    ;
;   0.632 ;   0.632  ; RR ; CELL   ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input|o                                                                                    ;
;   0.791 ;   0.159  ; RR ; CELL   ; 1      ; IOIBUF_X78_Y115_N47 ;            ; clk1_p~input~io_48_lvds_tile/ioclkin[2]                                                           ;
;   0.791 ;   0.000  ; RR ; IC     ; 2      ; CLKCTRL_2I_G_I7     ;            ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|inclk  ;
;   1.321 ;   0.530  ; RR ; CELL   ; 5      ; CLKCTRL_2I_G_I7     ;            ; altclkctrl_ip_01_i|altclkctrl_0|altclkctrl_ip_01_altclkctrl_2000_dpnsueq_sub_component|sd1|outclk ;
;   3.908 ;   2.587  ; RR ; IC     ; 1      ; FF_X77_Y121_N53     ; High Speed ; ff4|clk                                                                                           ;
;   3.908 ;   0.000  ; RR ; CELL   ; 1      ; FF_X77_Y121_N53     ; High Speed ; ff4                                                                                               ;
;   3.754 ;   -0.154 ;    ;        ;        ;                     ;            ; clock pessimism removed                                                                           ;
; 4.064   ; 0.310    ;    ;        ;        ;                     ;            ; clock uncertainty                                                                                 ;
; 4.428   ; 0.364    ;    ; uTh    ; 1      ; FF_X77_Y121_N53     ;            ; ff4                                                                                               ;
+---------+----------+----+--------+--------+---------------------+------------+---------------------------------------------------------------------------------------------------+

----------------------------
; Extra Fitter Information ;
----------------------------
HTML report is unavailable in plain text report export.

roeekalinsky
New Contributor II
786 Views
Ash_R_Intel
Employee
776 Views

Hi,

Sorry for the late response. One thing that I want to point out from your report is that the incoming clock is fed to the CLKCTRL block for some reason. Does the the CLKCTRL block has two fanouts, PLL and core logic?

If yes, then that might be the issue. The PLL will not be able to compensate for the network that is fed by the incoming clock. You may want to generate a same frequency clock from PLL to operate. I think the set_max_skew constraint should also be set between the two clock domains.


If the CLKCTRL block is driving only the PLL, then it can be avoided. Use dedicated clock pin to feed the PLL directly.


Regards


roeekalinsky
New Contributor II
759 Views

Hi @Ash_R_Intel, thanks for the response.

 

Yes, clk1 is on a GCLK via a CLKCTRL block, and it fans out to core logic as well as to the reference clock input of the PLL. That is as intended, and shouldn't be an issue, unless I'm missing something here (?).

 

I'm not expecting the PLL to compensate for the delay of the clk1 distribution network. We're agreeing on that. The clk1 distribution network has an uncompensated insertion delay of about 3.8 to 3.9 ns, and that's fine.

 

What I am expecting is that the PLL will compensate for the delay of the clk2 distribution network only. And as such, clk2 should end up in phase with clk1, since clk1 is the reference clock input of the PLL. In other words, with the PLL compensating properly, there shouldn't be any significant skew between the endpoints of the clk2 network and the endpoints of the clk1 network.

 

Now, in the timing report snippet I provided, we can see that the total insertion delay of clk1 is 3.908 ns to the clock input of ff4, and 3.784 ns to the reference clock input of the PLL. That's plenty close enough, practically zero clock skew (just 0.124 ns), which is as expected. We'd expect to see very little clock skew between different endpoints of a given clock network, clk1 in this case, and that's what we're seeing. So far so good.

 

Now what doesn't look right is downstream of that, the compensation loop of the PLL. The reference clock input of the PLL, again, is an endpoint of the clk1 network, which arrives at 3.784 ns. So far so good. What I would then expect to see downstream is that the PLL's compensation loop in this topology should make it such that the endpoints of the clk2 network (such as the clock input of ff3) should also arrive at around 3.784 ns, plus/minus very little skew. And that's not at all what we're seeing. What we are seeing in the report is that clk2 arrives at the clock input of ff3 at -1.330 ns. That's about 5 ns earlier than it should be. Not good, and that's what remains unexplained.

 

Does my analysis make sense? Am I missing something?

 

BTW, about set_max_skew, I don't think that applies here (although I did try it anyway, and it didn't work). As I understand it, set_max_skew pertains to registered paths or ports, not to clocks. Although if I'm wrong or if I missed your point, please correct me and show an example of what you mean.

 

About generating both clocks as outputs of the same PLL, I understand how that could be a possible work-around. But there are practical reasons why that's not an option in my application, and I don't know of a reason why what I'm doing shouldn't work. I've used this clock topology in Xilinx devices before with great success, and Altera Arria 10 documentation also seems to suggest that it should work. And Quartus isn't complaining about it, it just produces timing reports that don't seem like it's working right. So I'd very much like to get to the bottom of it, figure out what's wrong, and make it right. Any more thoughts?

 

Thanks,
-Roee

 

Ash_R_Intel
Employee
693 Views

Hi,

The PLL can compensate for the clocks that are generated by it, not other network path. Though both clk1 and clk2 are driven by the GCLK network, they may be placed far away (you may find out from the ChipPlanner).

You may experiment with the scenario that I explained in my previous answer. Generate both clk1 and clk2 from same PLL. That will definitely give you better result.

I did these experiments on a simple design and suggesting them based on those.


Regards.


roeekalinsky
New Contributor II
677 Views

Thanks, @Ash_R_Intel.

 

For experiment's sake, modifying my trivial design example to feed the PLL directly from the FPGA input pin, the PLL's refclk arrives at 0.704ns, and clk2 arrives at ff3 at -0.687ns. And to recap the original scenario, having a CLKCTRL upstream of the PLL, the PLL's refclk arrives at 3.784ns, and clk2 arrives at ff3 at -1.334ns.

 

With the addition of an upstream CLKCTRL / GCLK in the latter case, one would expect a later clk2 arrival time, not earlier as observed. So one must ask, does this observed result even make sense on the face of it?

 

It's the compensation figure in the PLL that is coming up vastly different between the two scenarios, -5.601ns vs. -9.485ns, respectively, and that's what's responsible for clk2 arriving even earlier with the upstream CLKCTRL rather than much later as one would expect. It is unclear where that difference in compensation arises, as the clock distribution downstream of the PLL is identical between the two scenarios, and I think we're both agreeing that the PLL should not in any way be compensating for the added delay of a CLKCTRL / GCLK upstream of it. Right? The observed difference in compensation does not correspond to a difference in the clock network delays. So from where does this difference in compensation arise?

 

A purely speculative possible interpretation of the observations above, it almost looks as though Quartus is attempting to adjust the delay of the compensation loop somehow to phase-align clk2 to the FPGA clock input pin in both cases, as though it IS trying to compensate for the added delay of the upstream CLKCTRL / GCLK if present (which is NOT what we expect nor want). Could that be the case? Is that what it's trying to do? And if so, is there a way via constraints or otherwise to prevent Quartus from doing that?

 

Taking a step back:

 

The available documentation (UG-01155 and A10-HANDBOOK) does seem to indicate, in the text and in block diagrams, that the IOPLL can optionally receive its refclk input from a GCLK or RCLK network. So, is that fully supported, or not? And if so, what are the expected compensation characteristics with refclk fed from a GCLK or RCLK network?

 

As to the suggestion of feeding the PLL directly from the FPGA input pin instead:

 

The trivial design example I presented for discussion is just that. In reality, I don't have the luxury of feeding the PLL directly from the FPGA's input pin. What I'm developing is a reusable IP block that will be integrated into numerous different FPGA designs, where other parties may own the top level and other IP blocks residing in it. There will generally not be much visibility between parties and their respective IP, nor the opportunity to collaborate on the top level clocking scheme. The top level FPGA designs into which this IP block will be integrated may have different and unknown top level clocking schemes, and I can't really make any assumptions about the ultimate origin of the system clock provided to my IP block other than it will already be on a GCLK when I receive it. Then, internally in my IP block, I have a need to generate one or more derived clocks at integer multiples of the incoming system clock frequency and in-phase with it (and possibly also with dynamic gating).

 

I hope that gives you some context and a better idea of what I'm ultimately trying to accomplish. And if you have other suggestions that can accomplish these requirements within these limitations, I'm all ears.

 

I should mention too, just for background, that I've been doing this routinely in Xilinx devices, with which admittedly I am far more familiar. I naturally assumed that a similar capability exists in Altera devices, and the documentation seemed to suggest as much, though not clearly... I hope that was not an incorrect assumption/interpretation on my part.

 

Fundamentally, the capability I'm seeking is this: To be able to take into my block what is already a global clock, and from it to generate new global clocks that are in-phase with it. Is that possible in the Arria 10's PLL / clocking architecture, or not? And if so, how?

 

I look forward to your input.

 

Thanks,
-Roee

Ash_R_Intel
Employee
660 Views

Hi,

The compensation factor is dependent upon the routing that takes place during fitter. With design changed, the placement of the CLKCTRL blocks and the registers also changes. So, it is very much possible to have that variation in the compensation factor.


I agree, PLL does not compensate for the upstream CLKCTRL. It just takes care of the clocks that are generated from it.


Coming back to the original query, want to mention couple of points here.

1) GCLK networks provide less skew for a clock that passes through it.

2) Two different clocks on two different CLKCTRL blocks cannot have identical delays, just because of the fact that they are independent and have to reach to different flops in the chip and different locations.

3) For the PLL generated clocks as well, the same logic applies. They cannot have zero skew between themselves because they drive different paths.

4) The PLL definitely maintain the phase relationship between its input and output clock.

5) As long as the tool reports that there are no timing failures in the design, skews between the clocks should not be a matter of concern.

6) When the data path changes from one clock to other clock, it is better to either provide a set_max_skew constraint or declare that path as a false path.


If you look at a path in the tool driven by the same clock going through CLKCTRL, you will find a near to zero skew, but the same cannot be expected from different clock paths even though they have fixed relation. The skew on all the reported paths between the two clocks however, should remain same. If the tool reports the same clock skew number in these paths, then we are good.


Hope this helps.


Regards.



roeekalinsky
New Contributor II
638 Views

Hi @Ash_R_Intel,


Thanks for the feedback. I've gathered additional information on this issue from other sources as well, and the bottom line is that the Arria 10 IOPLL can't properly support the approach I was trying to take. It can't phase-align a GCLK output to a GCLK reference input. So I will use a different approach to accomplish my design goals.


Note however that there is an incorrect piece of information here, an incorrect understanding/assumption that we both made, as I've now learned.  And this is key to the whole thing.

 

@Ash_R_Intel wrote:
>> I agree, PLL does not compensate for the upstream CLKCTRL. It just takes care of the clocks that are generated from it.


Turns out that's not entirely true, and that's the primary cause for the skew I'm seeing. I've received confirmation that, as I suspected, Quartus is actually trying its best to compensate for all of the delay upstream of the IOPLL, including for the upstream CLKCTRL / GCLK if present (presumably coarsely matching those delays using static delays alone). Even if there's an upstream CLKCTRL / GCLK, it will try to match the output phase of the IOPLL to that of the FPGA input pin upstream of it all, not to the phase of the GCLK at the IOPLL's refclk input. Though this is never made clear in the documentation, this is the defined behavior for the IOPLL's normal mode when downstream of a GCLK. And there is no means by which to disable this behavior.


I wanted to clarify that here for anyone else who may be affected by this.


Thanks,

-Roee

 

dlevit
Beginner
537 Views

Hi Roee,

 

I'm in the same situation as you, moved recently from Xilinx to Intel, and puzzling about the same problem. Could you please share the approach which you found in regard to deskewing the clocks?

 

Thanks,

Dima

Ash_R_Intel
Employee
615 Views

Apologies for that statement. You are right, the IOPLL does try to compensate from pin to pll path. Please refer below link:

Intel® Quartus® Prime Pro Edition Help version 21.1 - PLL Compensation Mode logic option


Regards


roeekalinsky
New Contributor II
521 Views

Hi Dima,

 

The approach I ended up taking was to use an IOPLL in its "direct" compensation mode, and then deal with the non-zero skew at the clock domain crossings.

 

With the IOPLL in direct compensation mode, you do end up with some positive skew from the upstream clock to the downstream clock. But this skew is predictable/repeatable, doesn't vary significantly from build to build, and is on the order of a couple of nanoseconds. So, knowing the approximate skew relationship between the clocks, you can still deal with it statically using fully synchronous design techniques.

 

As to handling the domain crossings, the following is a relatively simple approach that can meet timing up to moderately high clock frequencies (for an Arria 10 in -1 speed grade, say ~350 MHz):

 

Crossing from the downstream clock domain to the upstream clock domain is the simpler of the two crossings. You can simply go direct reg-to-reg. Setup is the limiting factor here, and the time available for the reg-to-reg path is essentially the entire clock period minus the clock skew (and of course minus the clock-to-out of the source reg and the setup time requirement of the destination reg). Even after losing those couple of nanoseconds to clock skew, at the clock frequencies we're talking about you should still have ample setup slack on a direct reg-to-reg path.

 

Crossing from the upstream clock domain to the downstream clock domain is the more tricky crossing, with hold being the limiting factor. If you simply go direct reg-to-reg, you will probably end up with unfixable hold violations due to the clock skew. My solution to this was to go reg-to-reg-to-reg, where the first register stage is on the rising edge of the upstream clock, the second is on the falling edge of the downstream clock, and the third is on the rising edge of the downstream clock. This ensures that you have no less than half a clock period of time available for each reg-to-reg path, assuming a 50% duty cycle on the downstream clock. Or more precisely, you have half a clock period plus the clock skew for the first reg-to-reg path, and exactly half a clock period for the second reg-to-reg path. This should ensure ample hold slack, though we are now setup limited again.

 

If you're pushing for a high clock frequency where the half-period reg-to-reg path becomes the critical path for setup, you can further balance the time available between the register stages by modifying the duty cycle of the downstream clock, which you can naturally do using the configuration of the IOPLL. You have a total of one clock period plus the skew to go reg-to-reg-to-reg, so nominally you'd want the falling edge used for the middle register stage right in the middle of that, which you can get closer to by adjusting the duty cycle of the downstream clock. But again, I'd only bother with this optimization if this crossing becomes your critical path. Otherwise just keep it simple with a 50% duty cycle.

 

For applications where the attainable timing performance is adequate, the above approach has the advantages of being relatively simple, fully synchronous, having no timing exceptions, requiring no additional clock phases, and requiring no special timing constraints.

 

And if you need to push the performance even higher, you'll have to get even more creative, and I have... but I won't go into that here.

 

Hope this helps, and let me know if you have any questions.

 

-Roee
roee@porcupinetech.com

dlevit
Beginner
516 Views

Thank you for the detailed answer! 

 

I appreciate your insight. I need to cross signals from the upstream to the downstream clock domain, but the frequency is rather low, and your method looks very promising.

 

Thanks,

Dima

roeekalinsky
New Contributor II
514 Views

You're welcome.  Good luck!

-Roee

Reply