Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Honored Contributor I
1,424 Views

carry chain delay difference between real test and Timequest analysis result

Hi all, 

5CGXFC7D6F31C7N is used to implement a TDC in my design with QII14.0. 

according to timequest analysis results, the average carry delays of adder in a ALM are 52ps, 46ps, 27ps, 26ps respectively for timing corner slow_1100mV_0c, slow_1100mV_85C, fast_1100mV_0c and fast_1100mV_85C. 

however, real test shows that the average carry delay is only 11ps under room temperature (core voltage is 1118mV under test), and 11ps is not in the range of delay of four timing corners. 

 

it seems timequest gives incorrect results. 

could anyone explain this delay difference between real test and timequest?  

why the delay value differs so much between real test and timequest result? 

 

Regards, 

ingdxdy
0 Kudos
13 Replies
Highlighted
Honored Contributor I
128 Views

How are you measuring this delay on an internal FPGA path?

0 Kudos
Highlighted
Honored Contributor I
128 Views

 

--- Quote Start ---  

How are you measuring this delay on an internal FPGA path? 

--- Quote End ---  

 

 

Well, i chained 420 ALMs vertically, a carry signal is generated by an input (active high). then the carry signal is propagated along the (adder) chain.  

A 250MHz system clock is used to sample the carry signal position, then i could get to know how many ALMs the carry signal is passed within a 250MHz period(positions at two adjacent clock edges can be obtained and then subtracted). 

according to test result, the carry signal propagates about 360 ALMs in a 250MHz period, so i get average carry delay per ALM is 4000ps/360=11ps. 

 

you could look at the basic idea of FPGA based TDC implementation to get a deep understanding. 

 

Best Regards, 

ingdxdy
0 Kudos
Highlighted
Honored Contributor I
128 Views

 

--- Quote Start ---  

Well, i chained 420 ALMs vertically, a carry signal is generated by an input (active high). then the carry signal is propagated along the (adder) chain.  

A 250MHz system clock is used to sample the carry signal position, then i could get to know how many ALMs the carry signal is passed within a 250MHz period(positions at two adjacent clock edges can be obtained and then subtracted). 

according to test result, the carry signal propagates about 360 ALMs in a 250MHz period, so i get average carry delay per ALM is 4000ps/360=11ps. 

 

you could look at the basic idea of FPGA based TDC implementation to get a deep understanding. 

 

Best Regards, 

ingdxdy 

--- Quote End ---  

 

 

I have heard about TDC on FPGAs but I have doubts about its precision as it should take into account clock tree delays, timing violations, fitting variation, optimisation and needs some sort of averaging
0 Kudos
Highlighted
Honored Contributor I
128 Views

 

--- Quote Start ---  

I have heard about TDC on FPGAs but I have doubts about its precision as it should take into account clock tree delays, timing violations, fitting variation, optimisation and needs some sort of averaging 

--- Quote End ---  

 

 

Thanks for your reply. 

Well, as you look into FPGA based TDC implementation, you will see that these problems you said all will be handled commendably. 

 

Here my problem is not how to implement TDC in FPGA, my problem is why TimeQuest analysis result is different from real test. 

Since TimeQuest plays a critical role in regular FPGA designs, i need to figure out what is wrong? is my test result wrong? or something else. 

I have used the same method to acquire the carry delay of Cyclone-IV, the real test result is well fitted into Timequest analysis result, while for Cyclone-V, this is not the case.  

 

I think the problem could be replicated easily, i need a confirmation from someone else. 

 

Best Regards, 

ingdxdy
0 Kudos
Highlighted
Honored Contributor I
128 Views

I strongly suspect your measurement is wrong. 

 

The main concern in the design of TDC on fpga is that your signal input will be asynchronous and can't be sampled on registers without timing violation. It also can't be synchronised through two stage registers as this will defeat the purpose.  

So I don't see how TDC can sort out this issue.
0 Kudos
Highlighted
Honored Contributor I
128 Views

 

--- Quote Start ---  

I strongly suspect your measurement is wrong. 

 

The main concern in the design of TDC on fpga is that your signal input will be asynchronous and can't be sampled on registers without timing violation. It also can't be synchronised through two stage registers as this will defeat the purpose.  

So I don't see how TDC can sort out this issue. 

--- Quote End ---  

 

 

 

Although talking about TDC implementation in FPGA is not my purpose of this thread, i will input some more words as your concern. 

Yeah, as you said, external signal input are totally asynchronous. when a hit signal arrives, a carry signal will be generated and this carry signal will propagate along the tapped delay line which is created with carry chain (cascaded carry LEs in Cyclone-IV). for example, in Cyclone-IV, average carry delay is 45ps at room temperature. so in worst case, only the most front register where carry signal reaches may run into metastability(this may need some more thinking). since TDC implemented in FPGA is somewhat of statistical meaning, and on-line calibration is also employed to revise result, so it is ok for last bit instability.  

 

As to my test result, the same method is also used to measure carry delay in Cyclone-IV and the test result is 45ps under room temperature which is well fiitted into timequest result. I rechecked my method in Cyclone-V, and i did not find any problem with my method.  

 

i think i need a definitive answer, or maybe i need send a service request to Altera? 

 

Best Regards, 

ingdxdy
0 Kudos
Highlighted
Honored Contributor I
128 Views

The timing model needs to account for worst case variation from the fab in each corner + design margin, have you tried sweeping the entire FPGA to see how much variation you get? It's possible you have a faster part or used a faster region on the FPGA. 

 

Also, did you take into account clock skew in your measurement?
0 Kudos
Highlighted
Honored Contributor I
128 Views

 

--- Quote Start ---  

Hello ingdxdy, 

 

The Cyclone V serial devcies carry chain should be like this, as attached picture shows. One ALM include two LEs, however, timequest display that one of LE can't privide any delay time. 

--- Quote End ---  

 

 

Your testing result seems fit arria ii devices. And I run same test on StratixIV device, the delay time is about 14 ps. 

 

 

I doubt Cyclone V has same die as arria II. For i know that Arria GX has same die as Stratix IIGX. 

 

 

BTW, I have a question about you TDC implemented inside Cyclone V. The delay time is so tiny (not like 45 ps, as you said in CIV), it means that you have to implement longer delay chain, this would give you many problems for logic design. Speak frankly, I try to implement TDC inside Cyclone V and StratixIV, and i found the delay line was not stable. So i gave up.
0 Kudos
Highlighted
Honored Contributor I
128 Views

 

--- Quote Start ---  

Although talking about TDC implementation in FPGA is not my purpose of this thread, i will input some more words as your concern. 

Yeah, as you said, external signal input are totally asynchronous. when a hit signal arrives, a carry signal will be generated and this carry signal will propagate along the tapped delay line which is created with carry chain (cascaded carry LEs in Cyclone-IV). for example, in Cyclone-IV, average carry delay is 45ps at room temperature. so in worst case, only the most front register where carry signal reaches may run into metastability(this may need some more thinking). since TDC implemented in FPGA is somewhat of statistical meaning, and on-line calibration is also employed to revise result, so it is ok for last bit instability.  

 

As to my test result, the same method is also used to measure carry delay in Cyclone-IV and the test result is 45ps under room temperature which is well fiitted into timequest result. I rechecked my method in Cyclone-V, and i did not find any problem with my method.  

 

i think i need a definitive answer, or maybe i need send a service request to Altera? 

 

Best Regards, 

ingdxdy 

--- Quote End ---  

 

 

I don't think you will get what you want from Altera guys. Only information you can get from SR will like kaz have given you.
0 Kudos
Highlighted
Honored Contributor I
128 Views

Thanks much for your reply, Jerry, and sorry for my delayed response. 

 

As you said, there are two dedicated adders in a ALM, and according to TQ result, the second adder delay is 0ps, while the first adder contributes the main component of TDL. 

the granularity of delay element is not as uniform as in CIV, but this is not problem. i have seen other peoples works where they implement TDCs in 28nm(or 20nm?) Xilinx FPGAs, the delay elements there are either non-uniform and there exist many zero-wide bins. 

Generally, TDCs implemented in more advanced process FPGAs (28nm and below) need bin realignment, for IC delays including clock skews play a more important role than before to decide bins positions. the tuition of bins are arranged according to their physical locations should be adjusted. 

 

However, the purpose of my posting this thread is not to discuss how to implement TDCs in CycloneV, but the real test delay value is different from TQ result which makes me puzzled. 

 

B&W, 

ingdxdy
0 Kudos
Highlighted
Honored Contributor I
128 Views

 

--- Quote Start ---  

Thanks much for your reply, Jerry, and sorry for my delayed response. 

 

As you said, there are two dedicated adders in a ALM, and according to TQ result, the second adder delay is 0ps, while the first adder contributes the main component of TDL. 

the granularity of delay element is not as uniform as in CIV, but this is not problem. i have seen other peoples works where they implement TDCs in 28nm(or 20nm?) Xilinx FPGAs, the delay elements there are either non-uniform and there exist many zero-wide bins. 

Generally, TDCs implemented in more advanced process FPGAs (28nm and below) need bin realignment, for IC delays including clock skews play a more important role than before to decide bins positions. the tuition of bins are arranged according to their physical locations should be adjusted. 

 

However, the purpose of my posting this thread is not to discuss how to implement TDCs in CycloneV, but the real test delay value is different from TQ result which makes me puzzled. 

 

B&W, 

ingdxdy 

--- Quote End ---  

 

 

Did you mean re-asign delay time to those 0ps cell? 

 

BTW, what's you test input hit? I can implement PLL generated HIT as calibration input, however, this can't be realized in CV or SIVGX devices.
0 Kudos
Highlighted
Honored Contributor I
128 Views

 

--- Quote Start ---  

Thanks much for your reply, Jerry, and sorry for my delayed response. 

 

As you said, there are two dedicated adders in a ALM, and according to TQ result, the second adder delay is 0ps, while the first adder contributes the main component of TDL. 

the granularity of delay element is not as uniform as in CIV, but this is not problem. i have seen other peoples works where they implement TDCs in 28nm(or 20nm?) Xilinx FPGAs, the delay elements there are either non-uniform and there exist many zero-wide bins. 

Generally, TDCs implemented in more advanced process FPGAs (28nm and below) need bin realignment, for IC delays including clock skews play a more important role than before to decide bins positions. the tuition of bins are arranged according to their physical locations should be adjusted. 

 

However, the purpose of my posting this thread is not to discuss how to implement TDCs in CycloneV, but the real test delay value is different from TQ result which makes me puzzled. 

 

B&W, 

ingdxdy 

--- Quote End ---  

 

 

Hi ingdxdy, 

 

Recently, i implemented TDC in my CycloneIVGX devices. I found there were several (9 to 12) zero width bins. Did you meet this situation in your case? Also, i tried to do bin re-alignment, however, there was no effection. I attached my test result. The average bin width should be about 40 ps, but the biggest bin width is about 100ps. 

 

 

https://alteraforum.com/forum/attachment.php?attachmentid=14704&stc=1  

 

 

Thanks 

 

Jerry
0 Kudos
Highlighted
Novice
112 Views

hi there,

i noticed the same phenomenon and got the same question, did you get the answer for it? 

0 Kudos