Solved: Re:How to migrate source-synchronous GPIO to Intel PHY Lite IP?

hcom · ‎03-21-2022

I first asked this question about 3 months ago (link), but I haven't received any useful replies, so I will try to simplify my question.

I need to use a PHY Lite IP to implement a source-synchronous input interface in a Cyclone 10 GX (or Arria 10). The reason I cannot use normal source-synchronous techniques (see AN433) is that this fails timing. Therefore, according to the Cyclone 10 / Arria 10 documentation, I should use a PHY Lite IP for optimal timing performance.

I have read the PHY Lite user guide repeatedly, but I find it very unclear. I have also read AN756 repeatedly, but it is incredibly vague.

AN756 says this is how to implement a source-synchronous input interface using PHY Lite:

There is no description, just this one vague figure.

The obvious problem is that the top diagram (using ordinary GPIO) has 2 inputs: "Data" and "Clock". The bottom diagram (using PHY Lite) has 3 inputs: "Data", "Strobe In" and "Ref Clock".

So my question is simply: What should "Strobe In" and "Ref Clock" be connected to?

My best guess is that they should both be connected to "Clock" (this guess is based on the fact that there is only one blue waveform, but two blue arrows). I tried this idea in simulation and it seems to work correctly:

However, this fails to build in Quartus. The error message implies that the "Ref Clock" pin must not be connected to anything else.

Please help! After several months, this is extremely frustrating.

hcom · ‎04-06-2022

My company was somehow able to contact an engineer at Intel, who solved the problem immediately.

In summary:

The refclk and strb pins (two FPGA pins) must both be driven by the same external clock.
There is a bug in all versions of Quartus up to and including 21.4, which prevents the IO delay chains from being configured correctly. Therefore, the design will typically fail timing, even if everything has been configured correctly. As a workaround (until the bug is fixed in Quartus 22.1), the IO delay chains can be configured manually (see details below).

Here is the content of the (excellent) E-mail:

Quartus should automatically change the IO delay chain settings such that each IO is optimized for both setup and hold however there appears to be a problem with the automatic delay chain calculation algorithm in 21.3 which is why you are seeing lots of hold violations while your setup looks good.

I have checked in 21.4 and can confirm that the same issue exists in that version too.
I can however confirm that this issue has been resolved in the latest internal release of 22.1 which is due for release very soon.

As a temporary solution (prior to the release of 22.1) you can manually set the IO delay chain values using the assignment below.

set_instance_assignment -name IO_12_LANE_INPUT_DATA_DELAY_CHAIN 60 -to InData

You can apply this to all InData pins (as in the assignment above) however to get the optimum solution you will need to apply different values on a per-pin basis which is also supported.
I am looking at what specific settings are required to close timing and will update you in due course.

You can see the delay chain values used in the "Delay Chain Summary" section of the Route Stage report.

I tested the assignment above in 21.3 and the interface closed timing.

With regard to the refclk versus the strobe, ideally these should both originate from the same clock source such that they are PPM aligned. This will prevent the internal FIFO within the PHYLITE IP from overflowing/underflowing.
The simplest solution is to connect the same clock on your board to both the strobe and refclk pins of the device.

We applied these changes in our project and it met timing. Correct behavior has been confirmed in simulation.

View solution in original post

Ash_R_Intel · ‎03-23-2022

Hi,

The reference clock should be connected to dedicated clock input pin CLK_xx and Strobe_in should be connected to DQS pins.

For reference you can generate the IP, run fitter and let the tool assign the pins itself. You can then choose similar pins for your final design.

Regards

hcom · ‎03-23-2022

@Ash_R_Intel What is the difference between the reference clock and strobe_in in this case? For a source-synchronous input, aren't they exactly the same thing? (That's why the old GPIO method only needs one clock input).

Also, the reference clock does not have to be connected to a dedicated clock pin. I know this is Intel's recommendation for best timing performance, but it can also be connected to an internally generated (PLL) clock. Therefore, would it not be possible, for example, to run the ref_clk at a higher frequency than strobe_in, then use rdata_valid to identify the valid data?

It would be a huge expense if I need to redesign my PCB just to route the external clock to two pins instead of one. Is there definitely no way to make this work without changing the PCB schematic?

Ash_R_Intel · ‎03-24-2022

Hi Harry,

As the PHY Lite user guide mentions, this IP can be used for DDR kind of interfaces, where you get a Strobe i.e. DQS signal along with the data. DQS is not a free running clock. It is present only when data is present on the bus and is tightly aligned with the data. So, DQS should be used to capture the data at the first stage. See Figure 64 in the PHY lite user guide. Check signals data_in and strobe_in in the waveform.

The reference clock in the IP is used for following functions:

The PHY Lite for Parallel Interfaces IP uses a reference clock that is sourced from a dedicated clock pin to the PLL inside the IP. This PLL provides four clock domains for the output and input paths.

Core clock - This clock is generated internally by the IP and it is used for all transfers between the FPGA core fabric and I/O banks. The clock phase alignment circuitry ensures that this clock is kept in phase with the PHY clock for core-to-periphery and periphery-to-core transfers.

PHY clock - This clock is used internally by the IP for PHY circuitry running at the same frequency as the core clock.

VCO clock - This clock is generated internally by the PLL. It is used by both the input and output paths to generate PVT compensated delays in the interpolator.

Interface clock - This is the clock frequency of the external device connected to the FPGA I/Os.

In short, the reference clock is used for rest of the logic which handles the data.

Now, in your case, if you do not have either the reference clock or strobe_in coming separately, then stick to GPIO IP only. Is there any reason why you want to switch the IP?

Regards

hcom · ‎03-24-2022

@Ash_R_Intel

this IP can be used for DDR kind of interfaces

It can also be used for SDR, which is what I need.

The PHY Lite for Parallel Interfaces IP uses a reference clock that is sourced from a dedicated clock pin

As I wrote in my previous message, the reference clock does not have to be connected to a dedicated clock pin. You can read further details here:
https://www.intel.com/content/www/us/en/support/programmable/articles/000080391.html

Now, in your case, if you do not have either the reference clock or strobe_in coming separately, then stick to GPIO IP only. Is there any reason why you want to switch the IP?

As I wrote in my original question ~3 months ago, ordinary GPIO fails timing. Intel's recommended solution to improve timing performance is to migrate to PHY Lite (see AN756). I just want to know how to do that.

Ash_R_Intel · ‎04-01-2022

Hi,

If you want to switch to the PHY Lite IP, you will need to provide extra clock signal to the IP. It uses the PLL internally to extract some more clocks as mentioned in my previous reply. You may try to connect the strobe_in to the ref_clk input only if it is a free running clock.

Regards

hcom · ‎04-05-2022

@Ash_R_Intel That is more or less what I wrote in my original question. We are no closer to a solution.

hcom · ‎04-06-2022

My company was somehow able to contact an engineer at Intel, who solved the problem immediately.

In summary:

The refclk and strb pins (two FPGA pins) must both be driven by the same external clock.
There is a bug in all versions of Quartus up to and including 21.4, which prevents the IO delay chains from being configured correctly. Therefore, the design will typically fail timing, even if everything has been configured correctly. As a workaround (until the bug is fixed in Quartus 22.1), the IO delay chains can be configured manually (see details below).

Here is the content of the (excellent) E-mail:

Quartus should automatically change the IO delay chain settings such that each IO is optimized for both setup and hold however there appears to be a problem with the automatic delay chain calculation algorithm in 21.3 which is why you are seeing lots of hold violations while your setup looks good.

I have checked in 21.4 and can confirm that the same issue exists in that version too.
I can however confirm that this issue has been resolved in the latest internal release of 22.1 which is due for release very soon.

As a temporary solution (prior to the release of 22.1) you can manually set the IO delay chain values using the assignment below.

set_instance_assignment -name IO_12_LANE_INPUT_DATA_DELAY_CHAIN 60 -to InData

You can apply this to all InData pins (as in the assignment above) however to get the optimum solution you will need to apply different values on a per-pin basis which is also supported.
I am looking at what specific settings are required to close timing and will update you in due course.

You can see the delay chain values used in the "Delay Chain Summary" section of the Route Stage report.

I tested the assignment above in 21.3 and the interface closed timing.

With regard to the refclk versus the strobe, ideally these should both originate from the same clock source such that they are PPM aligned. This will prevent the internal FIFO within the PHYLITE IP from overflowing/underflowing.
The simplest solution is to connect the same clock on your board to both the strobe and refclk pins of the device.

We applied these changes in our project and it met timing. Correct behavior has been confirmed in simulation.

Ash_R_Intel · ‎04-13-2022

Glad to know that your issue is resolved. Closing the case.

Regards