FPGA, SoC, And CPLD Boards And Kits
FPGA Evaluation and Development Kits
5414 Discussions

Issues when testing DMA feature provided by Avalon MM+ IP with S10 GX dev kit card on Ubuntu 18.04

He4Forum
Employee
1,129 Views

I have installed s10 GX dev kit fpga card on Ubuntu 18.04 and program the Avalon MM+ Hard IP to the card. 

While I meet some issues when testing the DMA feature.

For the test application which locates at ~/avmm_bridge_512_0_example_design/software/user/example/intel_fpga_pcie_link_test, all the test mode cannot run successfully.

For example, the first test, Doing 100 writes and 100 reads, failed. I put the results of first 10 tries below.

At dword 0x0
Wrote 0x120aed54
Read  0xffffffff
At dword 0x1
Wrote 0x4db4d210
Read  0xffffffff
At dword 0x2
Wrote 0x616cd2bb
Read  0xffffffff
At dword 0x3
Wrote 0x29ad0aff
Read  0xffffffff
At dword 0x4
Wrote 0x31f54bf7
Read  0xffffffff
At dword 0x5
Wrote 0x7f08785d
Read  0xffffffff
At dword 0x6
Wrote 0x4fde1236
Read  0xffffffff
At dword 0x7
Wrote 0x401d6525
Read  0xffffffff
At dword 0x8
Wrote 0x44a0f03b
Read  0xffffffff
At dword 0x9
Wrote 0x49ebcbdc
Read  0xffffffff

For each time testing, the result of reading from the memory is 0xffffffff. While it shows reading and writing operaions don't meet errors. Only the numbers don't match.

Number of write errors:       0
Number of read errors:        0
Number of dword mismatches: 100

While I look into the code, I do not find where the device memory address is initialized for device memory. It is just initialized to NULL.

char *addr = NULL;

I am wondering why the test application does not use memory map to get a address mapping to the FPGA card memory. I'm not sure if it is initialized in other functions and whether the address(NULL) is the reason to cause the mismatches. 

However, other testing mode cannot work as well. So maybe there are caused by other reasons.

When I installed the PCIe driver for the FPGA card, as the provided driver code is for CentOS, I modified some code to make it could be successfully installed on Ubuntu. Not sure if my modifications will cause any issue.

Hope someone have ideas on the issues I've met. Thank you and look forward to reply.

0 Kudos
1 Solution
skbeh
Employee
799 Views

The problem you seen might related to below KBD

https://www.intel.com/content/www/us/en/programmable/support/support-resources/knowledge-base/ip/201...


Why does the Intel® Stratix® 10 Avalon®-MM Interface for PCIe* with DMA example design fail the link test and the DMA test when using the default selected BAR0?

 

Description

When the internal DMA Descriptor Controller is enabled, the BAR0 Avalon®-MM master is not available for general-purpose usage. The DMA Descriptor Controller uses this BAR0 interface through which the host CPU programs in the descriptor table.

The intel_fpga_pcie_link_test user application selects BAR0 as default when it's initially executed. If the user forgets to change to BAR2, which is where the onchip memory is attached, then both the link test and the DMA test will fail.

Resolution

The user must change to BAR2 before executing the link test and the DMA test.

See the execution transcript of the intel_fpga_pcie_link_test user application below for how to change to BAR2.

 

~$ sudo ./intel_fpga_pcie_link_test

*********************************************************

Intel FPGA PCIe Link Test

Version 2.0

0: Automatically select a device

1: Manually select a device

*********************************************************

> 0

Opened a handle to BAR 0 of a device with BDF 0x1300

 

*********************************************************

0: Link test - 100 writes and reads

1: Write memory space

2: Read memory space

3: Write configuration space

4: Read configuration space

5: Change BAR

6: Change device

7: Enable SRIOV

8: Do a link test for every enabled virtual function

    belonging to the current device

9: Perform DMA

10: Quit program

*********************************************************

> 5

Changing BAR...

Enter BAR number (-1 for none):

> 2

Successfully changed BAR!

View solution in original post

29 Replies
He4Forum
Employee
814 Views

Also I have tested a simply read & write operation.

    //...
    result = dev->write32(reinterpret_cast<void *>(addr), write_data);
    if (result == 1) {
        cout << "Wrote successfully!" << endl;
    } else {
        cout << "Write failed!" << endl;
    }

    uint32_t test_read = 0;
    result = dev->read32(reinterpret_cast<void *>(addr), &test_read);
    if (result == 1) {
        cout << "Read successfully!" << endl;
    } else {
        cout << "Read failed!" << endl;
    }
    cout << "Read number : " << test_read << endl;
    //...

Get the wrong result again.

> Enter address to write, in hex: 0000f000
> Enter 32-bit data to write, in hex: 12341234
> Writing 0x12341234 at BDF 0x1a00 BAR 0 offset 0xf000..
Wrote successfully!
Read successfully!
Read number : 0xffffffff

 Read number is 0xffffffff, similar with Doing 100 writes and 100 reads test. This time the using address is manually set, not NULL.

skbeh
Employee
749 Views

As I understand it, you are testing the PCIe link-up on Stratix 10 SX SoC Development Kit (DK-SOC-1SSX-L-D) as described below on Ubuntu by modifying some driver code from provided for CentOS.
https://www.intel.com/content/www/us/en/products/details/fpga/development-kits/stratix/10-sx.html
I'm not clear if the issue was caused by the driver code modification itelf.
However in the first place, are you able to check if the PCIe is able to link-up? i.e. using lspci to check.
Do you have any PCIe endpoint card plug-in into this SOC dev kit which act as rootport? or you are using the BTS (Board Test System) to test the loopback?

He4Forum
Employee
743 Views

Hi skbeh,

 

I am using Stratix 10 GX FPGA Development Kit (DK-DEV-1SGX-L-A). And I have checked that the FPGA card could be detected via PCIe.

$ lsmod | grep intel_fpga_pcie_drv
intel_fpga_pcie_drv    32768  2

$ lspci -d 1172:000 -v
1a:00.0 Unassigned class [ff00]: Altera Corporation Device 0000 (rev ff) (prog-if ff)
        !!! Unknown header type 7f
        Kernel driver in use: intel_fpga_pcie_drv
        Kernel modules: altera_cvp

$ lspci -d 1172:000 -v | grep intel_fpga_pcie_drv
Kernel driver in use: intel_fpga_pcie_drv

 The FPGA card is just plugged into a PCIe port on a host machine with Ubuntu 18.04 OS.

 

As the README for the driver for FPGA card says, 

TESTING
-------
The driver was developed and tested on CentOS 7.0, 64-bit with
3.10.514 kernel compiled for x86_64 architecture.

and when I installed the card with no modifications, it would meet errors. Seems that the errors are caused by the Linux kernel version, 5.10 on my machine and 3.10 in README introduction. I fixed the bugs manually and got the driver successfully installed.

skbeh
Employee
678 Views

You run the PCIe link test in manual or automatic mode?
— In automatic mode, the application automatically selects the device. The test selects the Intel Stratix 10 PCIe device with the lowest BDF by matching the Vendor ID. The test also selects the lowest available BAR (which is BAR0).
— In manual mode, the test queries you for the bus, device, and function number and BAR.

The other thing you can try is instead of selecting option 0 to automatically select the device, you try option 1 and manually select it.
In manual mode, use below command to determine the BDF.
$ lspci -d 1172

Then, enter the BDF and use BAR 2 for manual test, screenshot below show the manual test steps.

skbeh_1-1647919603747.png

 

skbeh_0-1647919567301.png

 

skbeh
Employee
800 Views

The problem you seen might related to below KBD

https://www.intel.com/content/www/us/en/programmable/support/support-resources/knowledge-base/ip/201...


Why does the Intel® Stratix® 10 Avalon®-MM Interface for PCIe* with DMA example design fail the link test and the DMA test when using the default selected BAR0?

 

Description

When the internal DMA Descriptor Controller is enabled, the BAR0 Avalon®-MM master is not available for general-purpose usage. The DMA Descriptor Controller uses this BAR0 interface through which the host CPU programs in the descriptor table.

The intel_fpga_pcie_link_test user application selects BAR0 as default when it's initially executed. If the user forgets to change to BAR2, which is where the onchip memory is attached, then both the link test and the DMA test will fail.

Resolution

The user must change to BAR2 before executing the link test and the DMA test.

See the execution transcript of the intel_fpga_pcie_link_test user application below for how to change to BAR2.

 

~$ sudo ./intel_fpga_pcie_link_test

*********************************************************

Intel FPGA PCIe Link Test

Version 2.0

0: Automatically select a device

1: Manually select a device

*********************************************************

> 0

Opened a handle to BAR 0 of a device with BDF 0x1300

 

*********************************************************

0: Link test - 100 writes and reads

1: Write memory space

2: Read memory space

3: Write configuration space

4: Read configuration space

5: Change BAR

6: Change device

7: Enable SRIOV

8: Do a link test for every enabled virtual function

    belonging to the current device

9: Perform DMA

10: Quit program

*********************************************************

> 5

Changing BAR...

Enter BAR number (-1 for none):

> 2

Successfully changed BAR!

He4Forum
Employee
659 Views

You are right! This does solve the problem. Thanks!

He4Forum
Employee
655 Views

Bad news. The method only works for the first time I swicth to BAR2. During the first try, reading and writing operations worked very well.

While I retried and found the same issue occured again though I had tried automatically and manually selecting the device.

It's weird...

He4Forum
Employee
646 Views

Your method is truely the solution.

However, I have tried several times. It shows the card with DMA feature could work for a while, then there will be something wrong with the card. In Quartus Software->Tools->Programmer, the USB blaster II should always be there. When the DMA could not work, I find that the USB blaster II will disappear. Only using $shutdown command to power the host machine and the card off could make the USB blaster II appear.

He4Forum_0-1647940619309.png

skbeh
Employee
638 Views

There are 0-9 application tests as shown below.

Can you confirm the USB blaster II disconnected issue only happened in "9: Perform DMA"? 

Or it happened on both '0: Link test - 100 writes and reads" and "9: Perform DMA" ?


0: Link test - 100 writes and reads

1: Write memory space

2: Read memory space

3: Write configuration space

4: Read configuration space

5: Change BAR

6: Change device

7: Enable SRIOV

8: Do a link test for every enabled virtual function

  belonging to the current device

9: Perform DMA


He4Forum
Employee
626 Views

The issue is for both '0: Link test - 100 writes and readsand "9: Perform DMA".

When the card works well, choose '0: Link test - 100 writes and reads" and get the result like this.

 

Doing 100 writes and 100 reads..
Number of write errors:       0
Number of read errors:        0
Number of dword mismatches:   0

 

Choose "9: Perform DMA" and get,

 

*********************************************************
Current DMA configurations
    Run Read  (card->system)  ? 1
    Run Write (system->card)  ? 1
    Run Simultaneous          ? 1
    Number of dwords/desc     : 2048
    Number of descriptors     : 128
    Total length of transfer  : 1e+03 KiB
*********************************************************
 0: Run DMA
 1: Toggle read DMA
 2: Toggle write DMA
 3: Toggle simultaneous DMA
 4: Set the number of dwords per descriptor
 5: Set the number of descriptors per DMA
 6: Return to main menu
*********************************************************

 

then "0 : Run DMA",

 

Enter the number of DMA operations to initiate; enter 0 for infinite loop:

 

Enter "2" and then get the result.

 


*********************************************************
Current DMA configurations
    Run Read  (card->system)  ? 1
    Run Write (system->card)  ? 1
    Run Simultaneous          ? 1
    Number of dwords/desc     : 2048
    Number of descriptors     : 128
    Total length of transfer  : 1e+03 KiB

Current run #: 2
Current time : Wed Mar 23 13:49:54 2022

DMA throughputs, in GB/s (10^9B/s)
    Current Read Throughput   :  0.01
    Average Read Throughput   :  0.01
    Current Write Throughput  :  0.01
    Average Write Throughput  :  0.01
    Current Simul Throughput  :  0.01
    Average Simul Throughput  :  0.01
*********************************************************

 

 

When the USB blaster II disconnected issue happens, both could not work.

 '0: Link test - 100 writes and reads" : 

 

Number of write errors:       0
Number of read errors:        0
Number of dword mismatches: 100

 

"9: Perform DMA" : 


Current run #: 1
Current time : Wed Mar 23 13:53:10 2022

DMA throughputs, in GB/s (10^9B/s)
    Current Read Throughput   :  0.00
    Average Read Throughput   :   inf
    Current Write Throughput  :  0.00
    Average Write Throughput  :   inf
    Current Simul Throughput  :  0.00
    Average Simul Throughput  :   inf
*********************************************************
Stopping DMA run due to error..

Meanwhile I notice that when the USB blaster II works, run the command

$ lsusb

and we can find the FPGA card.

lsusb : Bus 001 Device 002: ID 09fb:6810 Altera

when USB blaster II is disconnected, it disappears.

 

skbeh
Employee
620 Views

I see, when USB blaster II is connected, both tests passing ('0: Link test - 100 writes and readsand "9: Perform DMA")

When USB blaster II disconnected, definitely the result will show as failed since the connection has loss. 

Can you identify under what condition the USB blaster II start loss its connection? While at the middle of performing the '9: Perform DMA"?  Or when both test #0 and test#9 completed and you repeat another round of test?

 

He4Forum
Employee
614 Views

The issue usually happens when both test #0 and test#9 have completed and I'm going to repeat another round of test.

And it seems if I don't kill the testing application task and make it keep running, this issue won't happen.

Meanwhile, seems only after I kill current running test application and launch a new one, USB blaster II will be disconnected. And usually, I only could run the test successfully for 1 or 2 times, then the DMA could not work.

He4Forum
Employee
543 Views

What's more, when I run the command,

$ dmesg | grep usb

I find there is one item shows like this.

[  556.832823] usb 1-8: USB disconnect, device number 2

The device does disconnect via USB. Don't know why this happens.

He4Forum
Employee
539 Views

I find that at the middle of performing the '9: Perform DMA", it will happen too. I try to set the loop number to 1000 and when it's running at the 52th loop, the issue happens.

skbeh
Employee
491 Views

Please try de-select the "Enable burst capability for Avalon-MM BAR0 Master port" (HPRXM BAR0) option as shown below, then re-generate RTL, re-compile and configure the example design again.

Let's see if disabling burst mode for BAR0 allows the DMA test to work because it looks like the HPRXM (burst mode) module and the DMA module have conflicts on BAR0 causing problem.

Let me know your test result after this change.


He4Forum
Employee
470 Views

Could you please resend the pic? it could not be loaded on my PC. Thanks.

skbeh
Employee
465 Views

I resend the picture. Hope you get it.

Disable burst.png

He4Forum
Employee
455 Views

Thank you. It can bo loaded now. While my Quartus window looks different form yours. Can I just disable BAR0?

He4Forum_0-1648103277488.png

 

He4Forum
Employee
435 Views

If just disable BAR0, "0: Link test - 100 writes and reads" test can pass while "9: Perform DMA" cannot.

Are you using Avalon MM IP instead of MM+? I can test the design for Avalon MM IP to see whether it could work.

skbeh
Employee
430 Views

Please disable this Bursting option under 'Avalon-MM Setting' tab and try again "9: Perform DMA"

BAR0 is required for DMA test, cannot be disabled BAR0.

(ReDisable_bursting.png

Reply