Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, FPGA AI Suite, Software Stack, and Reference Designs
477 Discussions

Developing a high speed stopwatch

d3x0r
Beginner
1,684 Views

I've been trying to make a high speed counter.   Simulation says it should work- but the simulator I have working doesn't really account for gate propagation delays.

When I compile this, and download it to this Arduino Vidor 4000 which has a Cyclone 10 FPGA on it, even from initialization I'm getting noise out... I've tried a few different approaches.  I was sort of thinking maybe the verilog compilation would have a method to know when a counter had completed counting, but in the netlist I don't really see anything like that. 

This last iteration uses a small counter (9 bits) which 8 bits is used, and the 9th bit is a carry ingo a 17 bit counter, which 16 bits are used and the top bit carries into a 40 bit counter for a total of 64 bits of counting.  the slower counters (above the 8 bit counter) should tick at a normal rate, but their output is just noise... 

I was able to get it to sort of work by just assigning the rCOUNTER value (a 40 bit counter at the time) directly to the output, bypassing the latching registers - but that will not work for my needs... I do need as close to the correct count of ticks latched as I can get.  Right now I can send a command to generate a latch - but that eventually will come from external hardware.

 

This is that version - it uses just a not-gate to drive the clock... I have had this working at various points somewhat better, but at some point something goes wrong and I just get noise out.

https://github.com/d3x0r/STFRPhysics/blob/master/hardware/fpga/new-counter2.v

 

This is a different version that just uses one 64 bit counter... 

https://github.com/d3x0r/STFRPhysics/blob/master/hardware/fpga/clock-module.v

 

I sort of figured that the ripple count wouldn't matter a lot how fast it is clocked, since higher value bits in the counter would just be ahead of a prior input; and I would be able to latch that counter into 1 of 2 registers to provide a stable output to send out.

The whole idea is that there's a high speed counter, and two signals that will latch the counter into a register (one for each signal), and hold the value until the next rising edge latch basically - I did add a reset signal, so really it will latch a new value with a new latch signal after the lock on the register is released with a reset signal.

Is this possible with any FPGA?  I have a requirement to count sub-nanosecond ticks, preferably 200ps or less.  I tried looking for a more specialized sort of high tick rate realtime clock, but didn't really find anything, and I'd like to have something that is already on a board with USB communication to it...

The following is some of the output - tl2d and t2 are one 64 register, which is getting latched.

The other 64 bit tl3d and t3 (tl3d is the low part) has never been latched, and really should be 0 from the rLatch2 variable in the program...


tl2d:DFF1F3FF t2:FFFFFFF4 tl3d:1FF1F3FF t3:FFFFFFF6 
tl2d:77F20909 t2:57539CD1 tl3d:1FF1F3FF t3:FFFFFFF6 
tl2d:6D82597B t2:57EFFC73 tl3d:1FF1F3FF t3:FFFFFFF6 

- this is another run, I delayed trying to do a latch for a few seconds, so this is what the board gives with nothing sent to it other than the FPGA code...

pins:0 tl2d:DFFF53FF t2:FFFFFFF6 tl3d:1FFF53FF t3:FFFFFFF6 

pins:1 tl2d:6D39846D t2:1BF5FC1 tl3d:1FFF53FF t3:FFFFFFF6   (this is the first latched value, which the top 40 bits (t2 and the top byte of tl2d)) should be 0... I don't understand at all why everthing ends up so bad.

 

btw - do #N fields in verilog programs matter when compiled for hardware? or are they only simulation tips?

 

 

Labels (2)
0 Kudos
16 Replies
_AK6DN_
Valued Contributor I
1,650 Views

I decided to look thru your code and found it is not even close to synthesizable verilog.

I'm not even sure any simulator would handle it really well.

 

Generating a clock like this is just not going to work.

Do you use any timing constraints at all? If you did they likely all failed miserably.

 

You need to go back to the basics and understand what is and is not synthesizable verilog.

 

And #N delay parameter are ignored during synthesis in Quartus.

 

 

...

always // Start at time 0 and repeat the begin/end forever
  begin
    iCLK <= !iCLK;    
  end

...

always @(posedge iCLK)
begin
    clrA_ <= rstA;
    rCOUNTERa <= rCOUNTERa+1+(rstA_ ^ rstA)*256;
end

 

 

So have you looked at the tutorials in ...

https://docs.arduino.cc/hardware/mkr-vidor-4000

 

On github this would be a good top level starting point ...

https://github.com/vidor-libraries/VidorBitstream/blob/release/projects/MKRVIDOR4000_template_bare/rtl/MKRVIDOR4000_top.v

 

 

0 Kudos
d3x0r
Beginner
1,630 Views

Synthesizes just fine.
None of that mentions how to 'apply timing constraints'. 


That link is just the link of the counter sources (and simulation param sources)... but repo and branch that has the complete Quartus and Arduino IDE sources.  (sort of, need to install JTAG_interface lib separately so the Arduino uses a different place than this for the FPGA_bitstream.h)  but still the resulting MKRVIDOR4000.ttf  gets byte reversed and copied to that other place... and it's really just the FPGA part that we're talking about here, it downloads fine... the jtag interface it uses to communicate also works fine.  I've added all sorts of things like 'debug' which is commented out now, but exposed various internal flags that could be seen by the main chip on the arduino.


https://github.com/d3x0r/JTAG_Interface/tree/d3x0r-hispeed-counter

 

clock-module.v lives here also... 

https://github.com/d3x0r/JTAG_Interface/tree/d3x0r-hispeed-counter/FPGA/projects/example_simple

Looks like synthesis works to me...

d3x0r_2-1704009483257.png


I've tried to get simulation in Quartus to work - but examples use simulators that aren't an option anymore... and there's a lot of options I could pick that I don't know if they are all compatible, or what differences between the options are. So I've been using the simulator in Vivado....

This is first few latching level changes...  this only uses the SimulationParams.v and clock.v (filled with various versions - this is actually the clock-module.v source)  from the other repo's directory...

d3x0r_0-1704008373110.png

Single tick zoom level at first latch

d3x0r_1-1704008901326.png

I did stack a few nots together ... 
always // Start at time 0 and repeat the begin/end forever
begin
  //#1
  iCLK <= !iCLK;
  if( !iCLK ) begin
    iCLKb <= !iCLKb;
     if( !iCLKb ) begin
       iCLKc <= !iCLKc;
       if( !iCLKc ) 
         iCLKd <= !iCLKd;      
     end
  end
end

d3x0r_3-1704009576643.png

 

 

0 Kudos
_AK6DN_
Valued Contributor I
1,619 Views

Until you can show the timing constraints and analysis 'Synthesizes just fine' is pretty meaningless when running the design on real hardware.

The FPGA simulation tools are infinitely fast and zero delay.
Real FPGA hardware is not.
It has nonzero setup and hold times on registers, and there are non-zero signal routing delays.

You need to provide info on how you are specifying and constraining your designs clocks (in the .sdc and .qsf files).
And the final placement and routing timing report should indicate that all timing constraints were met (or not)
and what the estimated maximum clock frequency is predicted to be.

Until you can provide that validation, loading a design into real hardware is a crap shoot if it works or not.
Just getting it thru synthesis, place, and route, but with no timing constraints, is meaningless.

0 Kudos
d3x0r
Beginner
1,581 Views

I got it to nearly a working state by removing a bunch of inferred latches; at least the initial state has 0's.

 

These are the unconstrained clocks.  they are all outside of the counter.  The last iCLK isn't mine, but is at the top of the main module

d3x0r_0-1704033969923.png

module MKRVIDOR4000_top
(
  // system signals
  input         iCLK,
  input         iRESETn,

And that top module I haven't touched.

 

I renamed my click to iCLK_ff, and it's not listed as a clock.

The other two bits are outputs from the jtag interface - they go to 'or' gates that should be the input latch signal, and the trigger from USB through jtag... they're not clocks either.   I searched for how to fix some of these clock signals, and some said I could use Assignment Editor - but none of those show up in the assignment editor to make not clock.

I did try to define a clock for iCLK_ff,l but then it said there was a defined clock, and no reference to it.

 

create_clock -name {MyDesign:MyDesign_inst|COUNTER:inst6|iCLK_ff} -period 0.2 -waveform { 0 0.1 }

But even then, I can just put in arbitrary numbers - doesn't mean it would work any better.

 

https://github.com/d3x0r/STFRPhysics/blob/master/hardware/fpga/clock-module.4.v  This is the version that is almost working. without any inferred latches.

0 Kudos
FvM
Valued Contributor III
1,605 Views

Hi,
apart from the posted code that's obviously not working in hardware and will never work, can we get a brief desscription of the overall design purpose? We can probably help to answer the question if and how it's implementable in Cyclone 10 FPGA.

Presently, the only substantial specification is "I have a requirement to count sub-nanosecond ticks, preferably 200ps or less".

Can you elaborate, sketching expected input waveforms and clarifying witch signal parameters have to be extracted?

Before going into HDL descriptions and timing constraints, I would discuss the problem referring to FPGA hardware capabilities.

As a simple starting point, Cyclone 10CL, speed class 8 has a maximal core clock of 400 MHz. That's (simplifying things a bit) the maximal clock speed of FPGA registers, resulting in a time resolution of 2.5 ns. Faster events can be processed under circumstances using DDR and phase shifted clocks. There are nevertheless minimal pulse witdh requirements. 200 ps pulse width is out of reach, 200 ps edge delay generation and measurement can be feasible.

0 Kudos
d3x0r
Beginner
1,577 Views

Timing diagram with 

d3x0r_2-1704035417762.png

 

Comments from the code...

input               iLatch1,      // signal event that latches the counter to register 1
input               iLatch2,      // signal event that latches the counter to register 2
input               iResetLatch1, // sent after the value in register 1 is read from (or sent to) the USB
input               iResetLatch2, // sent after the value in register 2 is read from (or sent to) the USB
output [31:0]       o1COUNTER,    // register 1 with latched counter value
output [31:0]       o1COUNTERHi,  // register 1 with latched counter value high bits
output [31:0]       o2COUNTER,    // register 2 with latched counter value
output [31:0]       o2COUNTERHi,  // register 2 with latched counter value high bits
output  oRdyCOUNTER,           // signals that data has been latched in register 1
output  oRdyCOUNTER2,           // signals that data has been latched in register 2

 So - a high speed clock.

2 registers to store the current clock, when triggered by iLatch1 or iLatch2
once the value is latched, the internal latchLock1 and latchLock2 are set, which become the oReadyCounter and oReadyCounter2 signals out.
The values are read over a USB connection from the latched registers  (o1Counter, o2Counter in the simulation diagram). 
Then the iRstLatch1, or iRstLatch2 is sent to reset the latchLock1 or latchLock2, to allow the iLatch1 or iLatch2 triggers to latch a new value.


   latchLock1 <= ( iLatch1 || latchLock1 ) && !( !iLatch1 && ( rstLatchLock1 ||iResetLatch1 ) );
	rstLatchLock1 <= ( iLatch1 && (iResetLatch1 || rstLatchLock1) ) ;
	
   latchLock2 <= ( iLatch2 || latchLock2 ) && !( !iLatch2 && ( rstLatchLock2 ||iResetLatch2 ) );
	rstLatchLock2 <= ( iLatch2 && ( iResetLatch2 || rstLatchLock2 ) ) ;

latchLock is set on posedge iLatch1 , and not reset.
if iLatch1 is still high when the iResetLatch1 comes in, then an internal flag rstLatchLock1 is set until iLatch1 disappears.

same of 2.


So [high speed counter]  ->on latch signal -> [register1] or [register2]  -> read values -> reset lock on registers so a new timer can be latched.

all of the latching/reset stuff is very slow (the latch signal from a pin will probably be about 1 millisecond, and from the test program is more like 10 milliseconds long)  the reset signal is also probably about 10 milliseconds long ... the time from latching until the signal is read should be less than 1 second, but delays of a few milliseconds overall are acceptable.  
The only thing that's time critical is 1) a high tick rate and 2) when the signal from a pin is recognized, latch the current clock into a register; and there's two registers which can be latched from the single time source, with separate latching signals.

0 Kudos
FvM
Valued Contributor III
1,569 Views

Hi,

I don't feel that my previous questions have been answered.

Anyhow, let me ask additional questions related to your latest published clock-module.4.v.

1. Are you aware that in synthesizable Verilog the module clock iCLK_ff can't be internally generated but must be provided through module interface? What's the intended clock frequency?

2. What are the expected waveforms of iLatch1 and iLatch2? 

[Addendum]
Here's the RTL schematic of clock-module.4.v

FvM_0-1704037415260.png


And so the "clock" appears in post-mapping schematic

FvM_1-1704037798113.png

I'm a bit surprized that Quartus does accept it at all. Interestingly it's not listed as clock in timing analyzer.

0 Kudos
d3x0r
Beginner
1,504 Views

1) "must be provided through module interface" I not convinced - while I understand I was generating a overly fast clock, as long as it's slow enough I'm sure I could use an internal clock, since the iCLK=!iCLK timer worked to increment the counter several times; but I guess I introduced  a bunch of  inferred latches between when it almost worked and when it didn't work at all.  (The following was tested after the rest of the message was written) The following synthetic clock works from 1 to 5, 0 has a lot of jitter, but 1 triggers the counter about every 0.5-0.7ns; 2 triggers it every 3.7 to 3.8ns; 3 is about 7ns, 4 is about 15ns, then there's a big jump and 5 triggers the counter every 47ns, and 6 is about 60ns.  (though this is brittle; I remove all the other code, and it stopped working entirely - I've put it back now, but only get 5ns for synthclock[1] and 10ns for synthclock[2])

 

reg [6:0] rSynthClock = 0;
always begin
   #1
	rSynthClock[0] = !rSynthClock[0];
end

always @(posedge rSynthClock[0] ) rSynthClock[1] = !rSynthClock[1];
always @(posedge rSynthClock[1] ) rSynthClock[2] = !rSynthClock[2];
always @(posedge rSynthClock[2] ) rSynthClock[3] = !rSynthClock[3];
always @(posedge rSynthClock[3] ) rSynthClock[4] = !rSynthClock[4];
always @(posedge rSynthClock[4] ) rSynthClock[5] = !rSynthClock[5];
always @(posedge rSynthClock[5] ) rSynthClock[6] = !rSynthClock[6];

 

 

2)The latch signal waveform is slow - 1ms on with an off time of about 1 second.  which is forever by any of the clocks.  

 

I brought in the iCLK_MAIN that is given to the design from the _top file.  It's only a 120mhz clock.

I then implemented a phase shift on that clock  https://stackoverflow.com/a/50172237/4619267
but that actually makes a lot of latches which make the phases pretty slow.  The phase offset is only about 1ns, I get basically 8 phases on the 120Mhz clock which gets me to 960Mhz effectively.  
The phase shifter is a long list of these... up to [25].

 

 

always @(posedge globalClock ) 	iCLK_ff_p[0] = !(iCLK_ff_n[0]);
always @(negedge globalClock) 	iCLK_ff_n[0] = (iCLK_ff_p[0]);
always begin  #5    iCLK_ff[0] = iCLK_ff_p[0] ^ iCLK_ff_n[0];  end


always @(posedge iCLK_ff[0] ) 	iCLK_ff_p[1] = !(iCLK_ff_n[1]);
always @(negedge iCLK_ff[0]) 	iCLK_ff_n[1] = (iCLK_ff_p[1]);
always begin  #5    iCLK_ff[1] = iCLK_ff_p[1] ^ iCLK_ff_n[1];  end

 

 

 

I could wish for a less latch intensive solution. Now nothing changes faster than the main clock now, I just get 25 waveforms that are the same as the main clock offset by an amount.  (which since there's only 8 phases that actually tracks about 3 clock pulses 4 on bits and 4 off bits... basically like this screenshot...

iCLK is the main clock (at the top) and iClk_ff[n] are phase shifted version - they composite into the top value of like 0000111100001111000011110,   0001111000011110000111100, etc.  

d3x0r_0-1704079329685.png

 

400 MHz would help, but not really because then I'd only get a couple phases on top of that.
The internal gates seem to be about 200ps, so I don't see it as entirely out of the ball park... (other than it seems to take at least 5 gates to form the phase shifted latches... ) but I can't think of a better way to do the phase shifting.  

 

This node has quite a few things in 'Equation' to make it... which is part of the phase shift logic

d3x0r_1-1704080074173.png



0 Kudos
_AK6DN_
Valued Contributor I
1,526 Views

Ok, I don't understand why you did not start here ....

https://github.com/vidor-libraries/VidorFPGA

 

It is a template project, with the top level verilog module defined, as well as supporting .qpf, .qsf, and .sdc files for a fully constrained compilation.
It appears that the iCLK coming into the FPGA is 8MHz, and the internal PLL is used to generate 24MHz and 120MHz clocks for on chip use.

0 Kudos
FvM
Valued Contributor III
1,484 Views

Don't want to argue why your internal clock generation method isn't reliable. Just take as granted that it's beyond any FPGA specification and not supported by Quartus. And not necessary because you can easily generate a wide range of legal clock frequencies with built-in PLL.

Quartus timing analyzer tells that fmax of basic 64 bit synchronous binary counter used in your design is about 150 MHz for Cyclone 10 speed class 8 and 200 MHz for fastest speed class 6.

Another basic problem of your design is iLatch1,2 being asynchronous to iCLK. Respectively it's not guaranteed to latch counter value consistently. In a regular design, iLatch has to be synchronized to iCLK or counter value gray encoded.

0 Kudos
Farabi
Employee
1,437 Views

Hello,


I guess many factor will affect the high speed counting. especially the maximum PLL clock out speed, and clock skew between PLL counter and package clock pin out.


regards,

Farabi


0 Kudos
d3x0r
Beginner
1,396 Views

https://github.com/d3x0r/JTAG_Interface/blob/stable_2Ghz/FPGA/projects/example_simple/clock-module.v

 

This ticks the counter at 300-600ps.  Depends on temperature.  Uses a synthetic clock (internal not gate driving a divide by 2 wire).  I also can't remove or change anything, or it stops behaving so well.  All the other parts must space out the clock from the counter enough that it work (or maybe the clock parts itself?)  
But, if I put my finger on it (warm it up) it goes slower... if I blow on it I can get it down to 400ps pretty regularly.

This is a discontinued board though, so it's probably not the best choice anyway which is sad.  It also appears that the board didn't actually expose any header pins to the FPGA - though I guess the pci-e slot is attached to the FPGA, so I'd need to get a socket and hook into that in order to give a hardware signal to this.

 

The counter also resets at 40 bits instead of counting all 64 bits.  which is a little odd - but would be fine that's like 512 seconds, and I could manually count the upper bits when that wraps - it's a little strange, since I started the design with 40 bits for the clocks anyway - ended up extending the counters to 64 just because it was easier to wire two 32 bit values to the ports.

 

I know what I see can't be seen by you, but it's not just running on a simulator, it's physically running here on my desk and clocking at that rate - so I don't understand why I should be convinced that what is happening isn't happening, and can't be done.

0 Kudos
_AK6DN_
Valued Contributor I
1,383 Views

"This ticks the counter at 300-600ps. Depends on temperature. "
"But, if I put my finger on it (warm it up) it goes slower... if I blow on it I can get it down to 400ps pretty regularly."
" I also can't remove or change anything, or it stops behaving so well. "

"Uses a synthetic clock (internal not gate driving a divide by 2 wire). "

 

So you are using a feedback loop of gates to generate am oscillator, which is not very stable nor reproducible.

We have told you that multiple times that this is an unreliable design approach.

That is why we have fixed clock sources and PLL blocks to be able to generate fixed frequency stable clock sources.

IDK what is the utility of a 'stopwatch' is where the clock wanders all over the place based on temperature.

 

"I know what I see can't be seen by you, but it's not just running on a simulator, it's physically running here on my desk and clocking at that rate - so I don't understand why I should be convinced that what is happening isn't happening, and can't be done."

I believe you were able to kludge together a one off hack. But you can't change it nor predict its frequency. Congratulations.

I'm done here. You don't want to listen to experienced FPGA designers that your approach is flawed. Good luck to you.

 

 

0 Kudos
TingJiangT_Intel
Employee
1,179 Views

Maybe we can use Timing Analyzer to see whether there is any unconstraint path or clock that may cause any problem. Also can check if there any timing violations.


0 Kudos
Saubhagya
Community Manager
953 Views

Testing post back comment, pls ignore


0 Kudos
TingJiangT_Intel
Employee
874 Views

As we do not receive any response from you on the previous question/reply/answer that we have provided. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.



0 Kudos
Reply