Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17056 Discussions

Signaltap: discontinuty length in conditional traces

Altera_Forum
Honored Contributor II
1,538 Views

Is it possible to get signaltap to show a true timeline when conditional storage is enabled? 

 

I'm trying to track down some issues with latency on PCIe requests (which can take 256+ clocks at the best of times) but I think are being stalled for several 100us. 

I can suppress the storing of data while the PCIe request is in progress - but then I've no idea how long it took. 

 

I'm probably looking for something more like the 'transitional mode' of some HP logic analisers.
0 Kudos
10 Replies
Altera_Forum
Honored Contributor II
859 Views

I traced a 32bit clock counter. 

The discrepency is 2156000 counts of the 62.5MHz 'application' clock. 

I make that 34.5ms - far longer than I ever imagined. 

So I don't think they can be 'normal' latency issues. 

I can only imagine that something like a PCIe retrain has happened.
0 Kudos
Altera_Forum
Honored Contributor II
859 Views

 

--- Quote Start ---  

 

I traced a 32bit clock counter. 

 

--- Quote End ---  

 

 

That was what I was about to suggest. I did that for these PCIe related traces (see p17) 

 

http://www.ovro.caltech.edu/%7edwh/correlator/pdf/altera_pcie_analysis.pdf 

 

I was disappointed with the Altera PCIe core, so did not end up using it. Otherwise I might be more help :) 

 

I don't suppose you know someone with a PCIe bus analyzer ... I'm sure one of those would trace the transactions and tell you what is going wrong ... 

 

Worst-case how about a PCIe host that you can control, eg., an ARM or PowerPC root-complex running U-Boot? You could sequentially perform the PCIe enumeration stages and see where it dies. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
859 Views

It all works 'most of the time', the root-complex is an intel atom board. 

PCIe read bursts usually take 128-256 clocks depending on the actual length. 

Writes are all posted and only have a delay of about 8 clocks + the data transfer clocks. 

I've have to write a simple multi-channel dma controller to generate the burst transfers.
0 Kudos
Altera_Forum
Honored Contributor II
859 Views

 

--- Quote Start ---  

It all works 'most of the time', the root-complex is an intel atom board. 

 

--- Quote End ---  

 

Sorry to hear that, that makes it really hard to debug. 

 

 

--- Quote Start ---  

 

PCIe read bursts usually take 128-256 clocks depending on the actual length. 

Writes are all posted and only have a delay of about 8 clocks + the data transfer clocks. 

I've have to write a simple multi-channel dma controller to generate the burst transfers. 

--- Quote End ---  

 

If your PCIe setup allows you to plug in two of your end-point devices, then it might be easier to setup DMA transfers between the two end-points, rather than the end-point to root-complex. 

 

I've used this scheme to debug PCI boards. Eg., I'll put the PowerPC processor on the PCI peripheral boards in reset, and then from the x86 host memmap the PCI board registers, program the DMA controller of one board to transfer to the other, and then read/write to the memory on the PowerPC board. This saves having to write a custom driver, and is "low-level" enough for me to know what is exactly happening. 

 

You could do much the same thing ... 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
859 Views

In our case the linux driver on the x86 host is simple enough anyway. 

It supports: 

1) pread/pwrite through to the fpga BAR space. 

2) mmap() of both BAR space and the fpga's PCIe master window (allocating physmem as needed). 

 

There might be some suitable status info on the fpga - but it is a bit thin! 

 

Also I didn't think PCIe supported transfers between two endpoints.
0 Kudos
Altera_Forum
Honored Contributor II
859 Views

 

--- Quote Start ---  

In our case the linux driver on the x86 host is simple enough anyway. 

It supports: 

1) pread/pwrite through to the fpga BAR space. 

2) mmap() of both BAR space and the fpga's PCIe master window (allocating physmem as needed). 

 

--- Quote End ---  

 

In the case of using DMA on the FPGA board, you need to reserve some memory to DMA into or from. I've written simple drivers for the x86 that allocate memory and print the physical address to the dmesg buffer. You can then manually program DMA controller registers to move data to/from the board. I'm pretty sure the code is in here (for an older kernel, so might need tweaking) 

 

https://www.ovro.caltech.edu/~dwh/correlator/pdf/cobra_driver.pdf 

https://www.ovro.caltech.edu/~dwh/correlator/software/cobra_driver.tar.gz 

 

 

--- Quote Start ---  

 

Also I didn't think PCIe supported transfers between two endpoints. 

--- Quote End ---  

 

I'm pretty sure you can. The root-complex BIOS/bootloader sets up the PCIe bus addresses. If you program a PCIe end-point with host addresses that correspond to another PCIe end-point, the DMA will go from the end-point to the PCIe switch (or root-complex) over to the destination PCIe end-point. As far as software is concerned, it should work just like PCI. :) 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
859 Views

Yes, I've no problem writing unix (various) kernel code and device drivers. 

So verifying/exercising the dma and pcie is relatively easy. 

The failure rate seems to depend on the pattern of pcie transfers - changing the buffer size of my 'reflect data' test (actually reflect over my hdlc link as well) changes the error rate. 

I'm going to see if the 'test_out' signals show anything interesting.
0 Kudos
Altera_Forum
Honored Contributor II
859 Views

 

--- Quote Start ---  

Yes, I've no problem writing unix (various) kernel code and device drivers. 

So verifying/exercising the dma and pcie is relatively easy. 

 

--- Quote End ---  

 

Ok. 

 

 

--- Quote Start ---  

 

The failure rate seems to depend on the pattern of pcie transfers - changing the buffer size of my 'reflect data' test (actually reflect over my hdlc link as well) changes the error rate. 

I'm going to see if the 'test_out' signals show anything interesting. 

--- Quote End ---  

 

If you were to use two PCIe end-points; one as the PCIe master and the other as the PCIe slave, you would be able to SignalTap II both ends of the PCIe transaction. If you had say an external 1pps signal or a trigger signal, you could use that to start a counter synchronously on both boards. If that counter was part of your traces, then you'd be able to compare the transactions on the two boards. Not sure if it'll help, but at least you'd have better visibility into both ends of the design ... 

 

The other option would be to buy one of the SoC kits that has a PCIe slot and put your FPGA board in that. In that case, you could trace the root-complex and end-point. 

 

Good luck with your debugging! :) 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
859 Views

It might all be down to some very dodgy looking PCIe cabling. 

The board doesn't plug directly into a PCIe socket, but it connected by two 'SATA' cables connected header board plugged into the PCIe socket. 

And the 40 minute build times don't help the rate of debugging (on a i7 with 8GB memory running Ubuntu).
0 Kudos
Altera_Forum
Honored Contributor II
859 Views

 

--- Quote Start ---  

It might all be down to some very dodgy looking PCIe cabling. 

The board doesn't plug directly into a PCIe socket, but it connected by two 'SATA' cables connected header board plugged into the PCIe socket. 

 

--- Quote End ---  

 

Sounds like my kind of hardware setup :) (I use PCIe to SMA breakout cables from SAMTEC) 

 

Can you try running a transceiver toolkit test? You do not need a PRBS pattern generator at the other end, just traffic ... I'm pretty sure you can run TTK over a valid PCIe link, but I have not tried to do that. 

 

 

--- Quote Start ---  

 

And the 40 minute build times don't help the rate of debugging (on a i7 with 8GB memory running Ubuntu). 

--- Quote End ---  

 

Ha! I hear you on that one ... I script my long compiles so that I can walk away while my machine builds multiple designs ... but that does not help when debugging! 

 

Cheers, 

Dave
0 Kudos
Reply