I have a super simple design right now. The Stratix 10M PCIe HIP (generated in qsys) has its rxm_bar0 interface exported, and this IP is instantiated in a top-level wrapper. In Qsys, I've configured BAR0 to have a 28-bit size parameter (I assume this is the size of the address), and it is set as 64-bit prefetchable.
In the wrapper, I've hard-coded the rxm_bar0 interface's signals as follows:
- read_waitrequest -> 0
- readvalid -> 1
- readdata -> 32'hdeadbeef
On my host machine, I have some simple UIO PCIe driver code that just reads from uio0. When I boot up the machine, in lspci I can see that PCIe device 1172:0000 shows up, but its BAR0 only has "256" as its size. See lspci verbose output below:
04:00.0 Unassigned class [ff00]: Altera Corporation Device 0000 (rev 01) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 28 NUMA node: 0 Region 0: Memory at f3100000 (64-bit, prefetchable) [size=256] Capabilities: <access denied> Kernel driver in use: uio_pci_generic
If I read 4 bytes, I get 0xdeadbeef, as expected. I can increase the number of bytes read to 8 or 16 with no issue. But once I go past 32, I get incorrect data starting from byte 0, and then a bus error, and then the BAR0 becomes disabled.
Does anyone know why this happens? Why isn't the size of BAR0 256 MB (2^28 bytes)?
I added a Avalon-MM Pipeline Bridge to the Qsys system and exported its master interface instead, and now the "region size" exported from lspci looks correct.
However, I'm still getting bus errors + driver failure if I write more than 16B to BAR0. Can someone tell me if my overall approach is correct?
- Setup link to FPGA endpoint using uio_pci_generic
- Open the file handle at /sys/class/uio/uio0/device/resource0 (I know that this is the correct one). The permissions to the open() call are O_RDWR | O_SYNC.
- mmap the file handle returned from step #2. The protection args given to mmap are PROT_READ | PROT_WRITE, and the flag is MAP_SHARED.
- memset N bytes to the pointer returned from step #3.
- msync the pointer returned from step #3.
- Read the pointer returned from step #3 32-bits at a time.
It's after step #6 that things start to act weirdly if I try to access more than 16B.
Some more strange behavior. If I comment out the code to memset and msync, I can read entire kilobytes off data off of the bus. HOWEVER, I can only read it 32b at a time. If I try to memcpy a large number of bytes from the uio0, then I get 0xffff_ffff for all the readback data, which is incorrect. If I read the values in a loop 32b at a time, I get 0xdead_beef, which is the expected result.
So, it looks like there is something wrong with *writing* more than 16B at a time to uio0, but reading is fine...
my understanding is when you use 64-bit prefetchable memory for the BAR, 2 contiguous BARs are combined to form 64bits prefetchable BAR. Thus, do you disable the BAR1 in your design? Thus, I believe this cause you cannot read more than 32bits each time.
BAR1 is disabled.
I figured out that the problem was actually with my memset and msync called, but I can't figure out why they cause bus error. If write to the pointer like u32_ptr, u32_ptr, etc. then there are no issues, but memset causes bus error.
memset is for setting a large chunk of memory to a certain byte value. msync is for waiting for all in-flight memory transactions to finish. They are C++ functions used in the Linux driver I am writing.