- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have installed s10 GX dev kit fpga card on Ubuntu 18.04 and program the Avalon MM+ Hard IP to the card.
While I meet some issues when testing the DMA feature.
For the test application which locates at ~/avmm_bridge_512_0_example_design/software/user/example/intel_fpga_pcie_link_test, all the test mode cannot run successfully.
For example, the first test, Doing 100 writes and 100 reads, failed. I put the results of first 10 tries below.
At dword 0x0
Wrote 0x120aed54
Read 0xffffffff
At dword 0x1
Wrote 0x4db4d210
Read 0xffffffff
At dword 0x2
Wrote 0x616cd2bb
Read 0xffffffff
At dword 0x3
Wrote 0x29ad0aff
Read 0xffffffff
At dword 0x4
Wrote 0x31f54bf7
Read 0xffffffff
At dword 0x5
Wrote 0x7f08785d
Read 0xffffffff
At dword 0x6
Wrote 0x4fde1236
Read 0xffffffff
At dword 0x7
Wrote 0x401d6525
Read 0xffffffff
At dword 0x8
Wrote 0x44a0f03b
Read 0xffffffff
At dword 0x9
Wrote 0x49ebcbdc
Read 0xffffffff
For each time testing, the result of reading from the memory is 0xffffffff. While it shows reading and writing operaions don't meet errors. Only the numbers don't match.
Number of write errors: 0
Number of read errors: 0
Number of dword mismatches: 100
While I look into the code, I do not find where the device memory address is initialized for device memory. It is just initialized to NULL.
char *addr = NULL;
I am wondering why the test application does not use memory map to get a address mapping to the FPGA card memory. I'm not sure if it is initialized in other functions and whether the address(NULL) is the reason to cause the mismatches.
However, other testing mode cannot work as well. So maybe there are caused by other reasons.
When I installed the PCIe driver for the FPGA card, as the provided driver code is for CentOS, I modified some code to make it could be successfully installed on Ubuntu. Not sure if my modifications will cause any issue.
Hope someone have ideas on the issues I've met. Thank you and look forward to reply.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem you seen might related to below KBD
Why does the Intel® Stratix® 10 Avalon®-MM Interface for PCIe* with DMA example design fail the link test and the DMA test when using the default selected BAR0?
When the internal DMA Descriptor Controller is enabled, the BAR0 Avalon®-MM master is not available for general-purpose usage. The DMA Descriptor Controller uses this BAR0 interface through which the host CPU programs in the descriptor table.
The intel_fpga_pcie_link_test user application selects BAR0 as default when it's initially executed. If the user forgets to change to BAR2, which is where the onchip memory is attached, then both the link test and the DMA test will fail.
The user must change to BAR2 before executing the link test and the DMA test.
See the execution transcript of the intel_fpga_pcie_link_test user application below for how to change to BAR2.
~$ sudo ./intel_fpga_pcie_link_test
*********************************************************
Intel FPGA PCIe Link Test
Version 2.0
0: Automatically select a device
1: Manually select a device
*********************************************************
> 0
Opened a handle to BAR 0 of a device with BDF 0x1300
*********************************************************
0: Link test - 100 writes and reads
1: Write memory space
2: Read memory space
3: Write configuration space
4: Read configuration space
5: Change BAR
6: Change device
7: Enable SRIOV
8: Do a link test for every enabled virtual function
belonging to the current device
9: Perform DMA
10: Quit program
*********************************************************
> 5
Changing BAR...
Enter BAR number (-1 for none):
> 2
Successfully changed BAR!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also I have tested a simply read & write operation.
//...
result = dev->write32(reinterpret_cast<void *>(addr), write_data);
if (result == 1) {
cout << "Wrote successfully!" << endl;
} else {
cout << "Write failed!" << endl;
}
uint32_t test_read = 0;
result = dev->read32(reinterpret_cast<void *>(addr), &test_read);
if (result == 1) {
cout << "Read successfully!" << endl;
} else {
cout << "Read failed!" << endl;
}
cout << "Read number : " << test_read << endl;
//...
Get the wrong result again.
> Enter address to write, in hex: 0000f000
> Enter 32-bit data to write, in hex: 12341234
> Writing 0x12341234 at BDF 0x1a00 BAR 0 offset 0xf000..
Wrote successfully!
Read successfully!
Read number : 0xffffffff
Read number is 0xffffffff, similar with Doing 100 writes and 100 reads test. This time the using address is manually set, not NULL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I understand it, you are testing the PCIe link-up on Stratix 10 SX SoC Development Kit (DK-SOC-1SSX-L-D) as described below on Ubuntu by modifying some driver code from provided for CentOS.
https://www.intel.com/content/www/us/en/products/details/fpga/development-kits/stratix/10-sx.html
I'm not clear if the issue was caused by the driver code modification itelf.
However in the first place, are you able to check if the PCIe is able to link-up? i.e. using lspci to check.
Do you have any PCIe endpoint card plug-in into this SOC dev kit which act as rootport? or you are using the BTS (Board Test System) to test the loopback?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi skbeh,
I am using Stratix 10 GX FPGA Development Kit (DK-DEV-1SGX-L-A). And I have checked that the FPGA card could be detected via PCIe.
$ lsmod | grep intel_fpga_pcie_drv
intel_fpga_pcie_drv 32768 2
$ lspci -d 1172:000 -v
1a:00.0 Unassigned class [ff00]: Altera Corporation Device 0000 (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: intel_fpga_pcie_drv
Kernel modules: altera_cvp
$ lspci -d 1172:000 -v | grep intel_fpga_pcie_drv
Kernel driver in use: intel_fpga_pcie_drv
The FPGA card is just plugged into a PCIe port on a host machine with Ubuntu 18.04 OS.
As the README for the driver for FPGA card says,
TESTING
-------
The driver was developed and tested on CentOS 7.0, 64-bit with
3.10.514 kernel compiled for x86_64 architecture.
and when I installed the card with no modifications, it would meet errors. Seems that the errors are caused by the Linux kernel version, 5.10 on my machine and 3.10 in README introduction. I fixed the bugs manually and got the driver successfully installed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You run the PCIe link test in manual or automatic mode?
— In automatic mode, the application automatically selects the device. The test selects the Intel Stratix 10 PCIe device with the lowest BDF by matching the Vendor ID. The test also selects the lowest available BAR (which is BAR0).
— In manual mode, the test queries you for the bus, device, and function number and BAR.
The other thing you can try is instead of selecting option 0 to automatically select the device, you try option 1 and manually select it.
In manual mode, use below command to determine the BDF.
$ lspci -d 1172
Then, enter the BDF and use BAR 2 for manual test, screenshot below show the manual test steps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem you seen might related to below KBD
Why does the Intel® Stratix® 10 Avalon®-MM Interface for PCIe* with DMA example design fail the link test and the DMA test when using the default selected BAR0?
When the internal DMA Descriptor Controller is enabled, the BAR0 Avalon®-MM master is not available for general-purpose usage. The DMA Descriptor Controller uses this BAR0 interface through which the host CPU programs in the descriptor table.
The intel_fpga_pcie_link_test user application selects BAR0 as default when it's initially executed. If the user forgets to change to BAR2, which is where the onchip memory is attached, then both the link test and the DMA test will fail.
The user must change to BAR2 before executing the link test and the DMA test.
See the execution transcript of the intel_fpga_pcie_link_test user application below for how to change to BAR2.
~$ sudo ./intel_fpga_pcie_link_test
*********************************************************
Intel FPGA PCIe Link Test
Version 2.0
0: Automatically select a device
1: Manually select a device
*********************************************************
> 0
Opened a handle to BAR 0 of a device with BDF 0x1300
*********************************************************
0: Link test - 100 writes and reads
1: Write memory space
2: Read memory space
3: Write configuration space
4: Read configuration space
5: Change BAR
6: Change device
7: Enable SRIOV
8: Do a link test for every enabled virtual function
belonging to the current device
9: Perform DMA
10: Quit program
*********************************************************
> 5
Changing BAR...
Enter BAR number (-1 for none):
> 2
Successfully changed BAR!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bad news. The method only works for the first time I swicth to BAR2. During the first try, reading and writing operations worked very well.
While I retried and found the same issue occured again though I had tried automatically and manually selecting the device.
It's weird...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your method is truely the solution.
However, I have tried several times. It shows the card with DMA feature could work for a while, then there will be something wrong with the card. In Quartus Software->Tools->Programmer, the USB blaster II should always be there. When the DMA could not work, I find that the USB blaster II will disappear. Only using $shutdown command to power the host machine and the card off could make the USB blaster II appear.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are 0-9 application tests as shown below.
Can you confirm the USB blaster II disconnected issue only happened in "9: Perform DMA"?
Or it happened on both '0: Link test - 100 writes and reads" and "9: Perform DMA" ?
0: Link test - 100 writes and reads
1: Write memory space
2: Read memory space
3: Write configuration space
4: Read configuration space
5: Change BAR
6: Change device
7: Enable SRIOV
8: Do a link test for every enabled virtual function
belonging to the current device
9: Perform DMA
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The issue is for both '0: Link test - 100 writes and reads" and "9: Perform DMA".
When the card works well, choose '0: Link test - 100 writes and reads" and get the result like this.
Doing 100 writes and 100 reads..
Number of write errors: 0
Number of read errors: 0
Number of dword mismatches: 0
Choose "9: Perform DMA" and get,
*********************************************************
Current DMA configurations
Run Read (card->system) ? 1
Run Write (system->card) ? 1
Run Simultaneous ? 1
Number of dwords/desc : 2048
Number of descriptors : 128
Total length of transfer : 1e+03 KiB
*********************************************************
0: Run DMA
1: Toggle read DMA
2: Toggle write DMA
3: Toggle simultaneous DMA
4: Set the number of dwords per descriptor
5: Set the number of descriptors per DMA
6: Return to main menu
*********************************************************
then "0 : Run DMA",
Enter the number of DMA operations to initiate; enter 0 for infinite loop:
Enter "2" and then get the result.
*********************************************************
Current DMA configurations
Run Read (card->system) ? 1
Run Write (system->card) ? 1
Run Simultaneous ? 1
Number of dwords/desc : 2048
Number of descriptors : 128
Total length of transfer : 1e+03 KiB
Current run #: 2
Current time : Wed Mar 23 13:49:54 2022
DMA throughputs, in GB/s (10^9B/s)
Current Read Throughput : 0.01
Average Read Throughput : 0.01
Current Write Throughput : 0.01
Average Write Throughput : 0.01
Current Simul Throughput : 0.01
Average Simul Throughput : 0.01
*********************************************************
When the USB blaster II disconnected issue happens, both could not work.
'0: Link test - 100 writes and reads" :
Number of write errors: 0
Number of read errors: 0
Number of dword mismatches: 100
"9: Perform DMA" :
Current run #: 1
Current time : Wed Mar 23 13:53:10 2022
DMA throughputs, in GB/s (10^9B/s)
Current Read Throughput : 0.00
Average Read Throughput : inf
Current Write Throughput : 0.00
Average Write Throughput : inf
Current Simul Throughput : 0.00
Average Simul Throughput : inf
*********************************************************
Stopping DMA run due to error..
Meanwhile I notice that when the USB blaster II works, run the command
$ lsusb
and we can find the FPGA card.
lsusb : Bus 001 Device 002: ID 09fb:6810 Altera
when USB blaster II is disconnected, it disappears.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see, when USB blaster II is connected, both tests passing ('0: Link test - 100 writes and reads" and "9: Perform DMA")
When USB blaster II disconnected, definitely the result will show as failed since the connection has loss.
Can you identify under what condition the USB blaster II start loss its connection? While at the middle of performing the '9: Perform DMA"? Or when both test #0 and test#9 completed and you repeat another round of test?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The issue usually happens when both test #0 and test#9 have completed and I'm going to repeat another round of test.
And it seems if I don't kill the testing application task and make it keep running, this issue won't happen.
Meanwhile, seems only after I kill current running test application and launch a new one, USB blaster II will be disconnected. And usually, I only could run the test successfully for 1 or 2 times, then the DMA could not work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What's more, when I run the command,
$ dmesg | grep usb
I find there is one item shows like this.
[ 556.832823] usb 1-8: USB disconnect, device number 2
The device does disconnect via USB. Don't know why this happens.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I find that at the middle of performing the '9: Perform DMA", it will happen too. I try to set the loop number to 1000 and when it's running at the 52th loop, the issue happens.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please try de-select the "Enable burst capability for Avalon-MM BAR0 Master port" (HPRXM BAR0) option as shown below, then re-generate RTL, re-compile and configure the example design again.
Let's see if disabling burst mode for BAR0 allows the DMA test to work because it looks like the HPRXM (burst mode) module and the DMA module have conflicts on BAR0 causing problem.
Let me know your test result after this change.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please resend the pic? it could not be loaded on my PC. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. It can bo loaded now. While my Quartus window looks different form yours. Can I just disable BAR0?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If just disable BAR0, "0: Link test - 100 writes and reads" test can pass while "9: Perform DMA" cannot.
Are you using Avalon MM IP instead of MM+? I can test the design for Avalon MM IP to see whether it could work.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please disable this Bursting option under 'Avalon-MM Setting' tab and try again "9: Perform DMA"
BAR0 is required for DMA test, cannot be disabled BAR0.
(Re
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page