FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6400 Discussions

PCtoFPGA AVMM DMA data-transfer cann't complete in time randomly using A10 PCIe AVMM IP

JET60200
New Contributor I
735 Views

hello experts, 

we are using A10 HIP AVMM IP design as a PCIe device to pug inside X86 XEON SERVER to work with.   

 

In Linux driver, we program FPGA AVMM DMA Engine to transfer bulk-data between FPGA SRAM and X86 DDR4 .  Each transfer of bulk data has same data length ( 256Kbytes IQ data per DMA Transfer in  71us), and this data transfer keeps running consistently and keep going. 

 

The whole system works as expect for 2-3 hours, and keep running correctly. we measure that  each transfer of DATA DMA spends around 31 us or so, so generally in every "71 us" period,  DMA can complete in time.  

 

But randomly, we can see a few DMA data transfer consumes  "+3977 us"  at rare condition,  which means that time DMA data transfer spend much more time than normal case , thus the system ran into problem and crashes.   Since it's a very rare exception, and it's related AVMM DMA IP engine inside A10 FPGA chipset, we have no idea how to move forward to debug it. 

 

Is there any idea how to debug it, or any debuggig status (registers) we can check in FPGA HIP AVMM module ?    Thanks very much for help and advices. 

 

   

   

 

 

0 Kudos
6 Replies
JET60200
New Contributor I
715 Views

continue to dig where is possible problem :

 

I tried to run lspci to check AER from PCIE HIP IP, before the final DMA stuck issue occurs, "lspci" doesn't see any ERROR , but afer the problem occurs, "lspci" shows there's a few AER error from FPGA HIP core, such as following : 

 

"

[root@localhost ~]# lspci -s 0000:17:00.0 -vv
17:00.0 Non-VGA unclassified device: Altera Corporation Device 1001
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 355
NUMA node: 0
Region 0: Memory at 38007ff00000 (64-bit, prefetchable) [size=512]
Region 2: Memory at c5800000 (32-bit, non-prefetchable) [size=4M]
Capabilities: [50] MSI: Enable- Count=1/4 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #1, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s <4us, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [200 v1] Vendor Specific Information: ID=1172 Rev=0 Len=044 <?>
Capabilities: [300 v1] #19
Capabilities: [800 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

CESta: RxErr+ BadTLP+ BadDLLP- Rollover- Timeout+ NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Kernel driver in use: nr_device_driver

"

 

Which means some ERR happen during A10 AVMM DMA moving operation.  But my facing problem is : for all these DMA operation, I just use 8 Descriptor Entries per each data moving, and every time these "8" dma descriptor entries keeps to have same content, why it keeping running for 3_4 hours, then suddenly met a "stuck" in PCIE HIP core ?  that's weird !

 

Anyone  has any idea ?  Thanks in advance  

0 Kudos
KhaiChein_Y_Intel
707 Views

Hi,


Could you provide the Signal Tap?


Thanks

Best regards,

KhaiY


0 Kudos
JET60200
New Contributor I
703 Views

Hi @KhaiChein_Y_Intel ,

 

1)  Regarding of " Signal Tap " capture, what signal(s) do you request to capture ?

 

2)  Also regarding of " RxErr+ ", " BadTLP+ Timeout+ ", I believe they are located in PCIe Physical Layer & Link Layer , correct ?   Since they're not relatedd to PCIE Application Data,  does that mean it may be a Hardware related issue ?  

 

Thanks for feedback //

0 Kudos
JET60200
New Contributor I
688 Views

Hello @KhaiChein_Y_Intel ,

 

 what signals we should capture to investigate this "stuck" issue ? Is there any guidance to describre this ?  Thanks in adavance

0 Kudos
KhaiChein_Y_Intel
682 Views

Hi,


Could you share the STP for the below signals and the .ip file? Please use translational for storage qualifier setting.


 Txs

 dma_rd_master

 dma_wr_master

 wr_dts_slave

 rd_dts_slave

 wr_dcm_master

 rd_dcm_master

 Rxm_BAR*

 tx_out0[<n>-1:0]

rx_in0[<n>-1:0]

hip_reconfig_clk

hip_reconfig_rst_n

hip_reconfig_address[9:0]

hip_reconfig_read

hip_reconfig_readdata[15:0]

hip_reconfig_write

hip_reconfig_writedata[15:0]

hip_reconfig_byte_en[1:0]

ser_shift_load

interface_sel

npor

nreset_status

pin_perst

refclk

RdDmaWrite_o

RdDmaAddress_o[63:0]

RdDmaWriteData[<w>-1:0]

RdDmaBurstCount_o[<n> -1:0]

RdDmaByteEnable_o[ <w>-1:0]

RdDmaWaitRequest_i

WrDmaRead_o

WrDmaAddress_o[63:0]

WrDmaReadData_i[<w >-1:0]

WrDmaBurstCount_o[<n>-1:0]

WrDmaWaitRequest_i

WrDmaReadDataValid_i

cfg_par_err

derr_cor_ext_rcv

derr_cor_ext_rpl

derr_rpl

dlup

dlup_exit

ev128ns

ev1us

hotrst_exit

ins_status[3:0]

ko_cpl_spc_data[11:0]

ko_cpl_spc_header[7:0]

l2_exit

lane_act[3:0]

ltssmstate[4:0]

rx_par_err

tx_par_err[1:0]

currentspeed[1:0]

Cra*


Thanks

Best regards,

KhaiY


0 Kudos
KhaiChein_Y_Intel
651 Views

Hi,


We do not receive any response from you to the previous question/reply/answer that I have provided. This thread will be transitioned to community support. If you have a new question, feel free to open a new thread to get the support from Intel experts. Otherwise, the community users will continue to help you on this thread. Thank you.


Best regards,

KhaiY


0 Kudos
Reply