- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I try to enable the Error Detection and Correction (EDAC) for EMAC1 on my Arria 10 SoC device.
I found the device tree entries for EMAC0 in the Linux kernel sources:
emac0-rx-ecc@ff8c0800 {
compatible = "altr,socfpga-eth-mac-ecc";
reg = <0xff8c0800 0x400>;
altr,ecc-parent = <&gmac0>;
interrupts = <4 IRQ_TYPE_LEVEL_HIGH>,
<36 IRQ_TYPE_LEVEL_HIGH>;
};
emac0-tx-ecc@ff8c0c00 {
compatible = "altr,socfpga-eth-mac-ecc";
reg = <0xff8c0c00 0x400>;
altr,ecc-parent = <&gmac0>;
interrupts = <5 IRQ_TYPE_LEVEL_HIGH>,
<37 IRQ_TYPE_LEVEL_HIGH>;
};
Where do I find the interrupts used for EMAC1? Is there any documentation available?
And does it require some changes in my Quartus project as well or is it fully covered by the Linux drivers?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Silvan,
There are not dedicated interrupts for ECC. Please review the ECC chapter in the Arria 10 Technical Reference Manual.
Please let us know if you have further questions.
Sue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@SueC_Altera wrote:Hi Silvan,
There are not dedicated interrupts for ECC. Please review the ECC chapter in the Arria 10 Technical Reference Manual.
Please let us know if you have further questions.
Sue
Hi Sue,
I followed this documentation and instruction to enable the ECC: https://www.rocketboards.org/foswiki/Documentation/EnableL2CacheECCInLinux
Unfortunately, after following the steps of this documentation, the kernel is not able to boot anymore. It prints the following error messages in the console output:
[ 21.099148] rcu: INFO: rcu_sched self-detected stall on CPU
[ 21.104715] rcu: 0-...!: (2100 ticks this GP) idle=003c/1/0x40000004 softirq=9/9 fqs=0
[ 21.112697] rcu: (t=2100 jiffies g=-1183 q=36 ncpus=2)
[ 21.117903] rcu: rcu_sched kthread timer wakeup didn't happen for 2099 jiffies! g-1183 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[ 21.129073] rcu: Possible timer handling issue on cpu=0 timer-softirq=4
[ 21.135744] rcu: rcu_sched kthread starved for 2100 jiffies! g-1183 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[ 21.145962] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 21.154879] rcu: RCU grace-period kthread stack dump:
[ 21.160143] rcu: Stack dump where RCU GP kthread last ran:
So are you able to share the information how to enable the ECC in Linux on the Arria 10 SoC device?
This is my further question to you and i would be happy if you can support me in this case! In addition to the meaning of the interrupt numbers in the dtb (see the other Reply https://community.intel.com/t5/Intel-SoC-FPGA-Embedded/Interrup-numbers-of-Arria-10-EDAC/m-p/1703150#M3178 )
Thank you and best regards, Silvan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sue
Thank you for the quick replay.
Then I do not understand the interrupts entry in the device tree blob, where interrupt 4, 36 for emac0 rx and 5, 37 for emac0 tx is declared.
Can you explain me the four numbers? And which are the valid configuration for EMAC1?
I see for all ECC entries in the device tree a configured interrupt number: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/arm/boot/dts/intel/socfpga/socfpga_arria10.dtsi?h=linux-6.6.y#n723
Where is the link of this numbers to the hardware? Or do I misunderstand something?
Are the numbers some how related with the System Manager? I mean, the documentation says that
"The ECC controller has the ability to generate single- and double-bit error interrupts to the System Manager."
see here: https://www.intel.com/content/www/us/en/docs/programmable/683711/22-3/ecc-controller-interrupts.html
Thank you,
Silvan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
If we look at the example section below of the device tree, we can state the following:
intc: interrupt-controller@ffffd000 { compatible = "arm,cortex-a9-gic"; #interrupt-cells = <3>; interrupt-controller; reg = <0xffffd000 0x1000>, <0xffffc100 0x100>; };
soc { #address-cells = <1>; #size-cells = <1>; compatible = "simple-bus"; device_type = "soc"; interrupt-parent = <&intc>; ranges; :
eccmgr: eccmgr { compatible = "altr,socfpga-a10-ecc-manager"; altr,sysmgr-syscon = <&sysmgr>; #address-cells = <1>; #size-cells = <1>; interrupts = <0 2 IRQ_TYPE_LEVEL_HIGH>, <0 0 IRQ_TYPE_LEVEL_HIGH>; interrupt-controller; #interrupt-cells = <2>;
sdramedac { compatible = "altr,sdram-edac-a10"; altr,sdr-syscon = <&sdr>; interrupts = <17 IRQ_TYPE_LEVEL_HIGH>, <49 IRQ_TYPE_LEVEL_HIGH>; };
l2-ecc@ffd06010 { compatible = "altr,socfpga-a10-l2-ecc"; reg = <0xffd06010 0x4>; interrupts = <0 IRQ_TYPE_LEVEL_HIGH>, <32 IRQ_TYPE_LEVEL_HIGH>; }; :
}; }; |
- intc node corresponds to the node related to the GIC in the Arria10 devices. The interrupt-controller; filed indicates that this is an interrupt controller and the #interrupt-cells indicates that the interrupts defined here will receive 3 parameters: the interrupt type, the interrupt number and the interrupt trigger. The possible value for interrupt type are:
The interrupt number parameter describes the peripheral ID that causes the interrupt, but since the 1st parameter (interrupt type) already defines the type, this seems to be using an offset 0 instead of 32 or 16. If we look into the table 88 in the TRM at https://www.intel.com/content/www/us/en/docs/programmable/683711/22-3/gic-interrupt-map-for-the-arria-10-soc-hps.html it defines the interrupt mappings for the SPI interrupt type, but we need to use the interrupt number as (interrupt ID - 32). The last parameter (interrupt trigger) defines how the interrupt is triggered like rising/falling edge or high/low level.
- The soc node indicates that this is an interrupt child of the intc node as indicated in the interrupt-parent field, so any node defined under this that doesn’t include an interrupt-parent field, will inherit the interrupt controller of this.
- The eccmgr node is a child of the soc node and inherits the intc and its interrupt controller. In this node are defined 2 SPI interrupts with interrupt-number of 0 and 2 (IDs 32 and 34). Looking at the table 88 in the TRM, these are System Manager related interrupts used to capture DERR/SERR maybe derived from ECC errors coming from different devices. Also observe in the last column in table 88 that the trigger of these interrupts Level, so this matches the definition in the interrupts as its indicated as IRQ_TYPE_LEVEL_HIGH. In this node we also see interrupt-controller field which indicates that this node is also considered as a interrupt-controller, but in this case seems that this is used to indicate different possible sources that could generate the interrupt. These normally have some kind of status register that indicates the source of the interrupt triggering. This also defines that any child of this node will inherits this interrupt controller and will use 2 parameters through the #interrupt-cells field. The 2 parameters that this interrupt controller receives are the interrupt ID (a bit in a mask that indicates the source of the interrupt) and the trigger type of the interrupt. For the ECC manager, seems that these bits are defined in the ecc_intmask_value register under the System Manager:https://www.intel.com/content/www/us/en/programmable/hps/arria-10/hps.html#topic/sfo1429890615426.html
- The sdramedac, l2-ecc and other nodes are child of the eccmgr node which also have as the eccmgr as interrupt controller parent. These defines their own interrupts with 2 parameters. For example, in the case of sdramedac it defines the interrupt number 17 and 49. The 17 seems to be related to ddr0 as indicated in the ecc_intmask_value register. I am not sure about the 2nd interrupt defined, but the value of this one corresponds to the 1st value + 32. Not sure if the 1st one targets CPU0 and the 2nd one targets CPU1. Same thing with the l2-ecc node, in which defines the interrupt number of 0 and 32. Also, the bit 0 corresponds to the l2 in the ecc_intmask_value register and the 2nd interrupt has an offset of 32 compared with the 1st one. These seems to match to all the nodes under the eccmgr node.
Hope this might help.
Fawaz.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Fawaz,
Thank you for your detailed explanation! The hint and information to the bit field helped to understand the entries...
I updated the device tree according to your description and added the ECC node for EMAC1:
emac1-rx-ecc@ff8c1000 {
compatible = "altr,socfpga-eth-mac-ecc";
reg = <0xff8c1000 0x400>;
altr,ecc-parent = <&gmac1>;
interrupts = <6 IRQ_TYPE_LEVEL_HIGH>,
<38 IRQ_TYPE_LEVEL_HIGH>;
};
emac1-tx-ecc@ff8c1400 {
compatible = "altr,socfpga-eth-mac-ecc";
reg = <0xff8c1400 0x400>;
altr,ecc-parent = <&gmac1>;
interrupts = <7 IRQ_TYPE_LEVEL_HIGH>,
<39 IRQ_TYPE_LEVEL_HIGH>;
};
However for some reason, the kernel is still not booting anymore with the active EMAC driver. As said I followed the documentation: https://www.rocketboards.org/foswiki/Documentation/EnableL2CacheECCInLinux?erpm_id=6579622_ts1753093952709 which results in the failure.
I tried two things. First, exactly according to the documentation. In another test, I just enabled the Altera SOCFPGA ECC driver (without any of the sub-drivers). But in both cases, the kernel is not booting anymore.
What is wrong with the settings? Is there any other changes required?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Silvan,
Can you share with me the complete boot log? this will help me in debugging the boot procedure.
In the meantime, I will try to replicate the scenario with my dev kit for further debugging.
Thank you,
Fawaz.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Fawaz,
Thank you for your support and time. I did also some additional tests on my side and found a solution:
It seams, that the ECC/EDAC driver doesn't like the kernels Power Management. In short, additional to the documentation, i disabled also the following Power management options in the kernel configuration (both are enabled by default):
Power management options -> [ ] Suspend to RAM and standby
Power management options -> [ ] Device power management core functionality
And now the kernel boots as excepted and the ECC driver is loaded correctly.
The reason why I enabled the EDAC is another issue on the EMAC or on the Ethernet connection. Unfortunately, the EDAC didn't report any error and I still observe the issue. I started a new thread here: https://community.intel.com/t5/Intel-SoC-FPGA-Embedded/High-Latency-Ethernet-on-Arria-10-SoC-device-using-HPS-EMAC-and/m-p/1704797#M3220
@FawazJ_Altera maybe you have any information about this topic?
First for completeness, the boot log in the failure case is here (maybe it helps some one else):
Starting kernel ...
Deasserting all peripheral resets
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 6.6.97-00006-g68629dbe4aeb-dirty (silvan@heldsksm1) (arm-linux-gnueabi-gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.42.0.20240723) #34 SMP Tue Jul 22 10:27:04 CEST 2025
[ 0.000000] CPU: ARMv7 Processor [414fc091] revision 1 (ARMv7), cr=10c5387d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[ 0.000000] OF: fdt: Machine model: ABC
[ 0.000000] Memory policy: Data cache writealloc
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000000000000-0x000000002fffffff]
[ 0.000000] HighMem [mem 0x0000000030000000-0x000000003fffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000000000-0x000000003fffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]
[ 0.000000] percpu: Embedded 15 pages/cpu s31764 r8192 d21484 u61440
[ 0.000000] Kernel command line: root=/dev/mmcblk0p5 ro rootwait console=ttyS0,115200 rootfs=ext4
[ 0.000000] Unknown kernel command line parameters "rootfs=ext4", will be passed to user space.
[ 0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 260608
[ 0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
[ 0.000000] Memory: 1025036K/1048576K available (9216K kernel code, 849K rwdata, 2172K rodata, 1024K init, 159K bss, 23540K reserved, 0K cma-reserved, 262144K highmem)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.000000] ftrace: allocating 33000 entries in 97 pages
[ 0.000000] ftrace: allocated 97 pages with 3 groups
[ 0.000000] rcu: Hierarchical RCU implementation.
[ 0.000000] rcu: RCU event tracing is enabled.
[ 0.000000] Rude variant of Tasks RCU enabled.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[ 0.000000] L2C-310 erratum 769419 enabled
[ 0.000000] L2C-310 enabling early BRESP for Cortex-A9
[ 0.000000] L2C-310: enabling full line of zeros but not enabled in Cortex-A9
[ 0.000000] L2C-310 ID prefetch enabled, offset 1 lines
[ 0.000000] L2C-310 dynamic clock gating enabled, standby mode enabled
[ 0.000000] L2C-310 cache controller enabled, 8 ways, 512 kB
[ 0.000000] L2C-310: CACHE_ID 0x410030c9, AUX_CTRL 0x76560001
[ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 0.000000] clocksource: timer1: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[ 0.000001] sched_clock: 32 bits at 100MHz, resolution 10ns, wraps every 21474836475ns
[ 0.000015] Switching to timer-based delay loop, resolution 10ns
[ 0.000453] Console: colour dummy device 80x30
[ 0.000501] Calibrating delay loop (skipped), value calculated using timer frequency.. 200.00 BogoMIPS (lpj=1000000)
[ 0.000516] CPU: Testing write buffer coherency: ok
[ 0.000556] CPU0: Spectre v2: using BPIALL workaround
[ 0.000563] pid_max: default: 32768 minimum: 301
[ 0.000716] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
[ 0.000730] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
[ 0.001563] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[ 0.002631] RCU Tasks Rude: Setting shift to 1 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=2.
[ 0.002781] Setting up static identity map for 0x100000 - 0x100060
[ 0.002975] rcu: Hierarchical SRCU implementation.
[ 0.002982] rcu: Max phase no-delay instances is 1000.
[ 0.003561] smp: Bringing up secondary CPUs ...
[ 0.004397] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
[ 0.004417] CPU1: Spectre v2: using BPIALL workaround
[ 0.004573] smp: Brought up 1 node, 2 CPUs
[ 0.004586] SMP: Total of 2 processors activated (400.00 BogoMIPS).
[ 0.004595] CPU: All CPU(s) started in SVC mode.
[ 0.005541] devtmpfs: initialized
[ 0.010215] VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4
[ 0.010447] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.010469] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[ 0.011733] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 0.012589] DMA: preallocated 256 KiB pool for atomic coherent allocations
[ 0.013632] hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
[ 0.013645] hw-breakpoint: maximum watchpoint size is 4 bytes.
[ 0.029627] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
[ 0.035917] SCSI subsystem initialized
[ 0.036102] usbcore: registered new interface driver usbfs
[ 0.036150] usbcore: registered new interface driver hub
[ 0.036193] usbcore: registered new device driver usb
[ 0.036541] pps_core: LinuxPPS API ver. 1 registered
[ 0.036549] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[ 0.036575] PTP clock support registered
[ 0.036592] EDAC MC: Ver: 3.0.0
[ 0.037164] FPGA manager framework
[ 0.038055] vgaarb: loaded
[ 0.039190] clocksource: Switched to clocksource timer1
[ 0.052140] NET: Registered PF_INET protocol family
[ 0.052411] IP idents hash table entries: 16384 (order: 5, 131072 bytes, linear)
[ 0.054264] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[ 0.054299] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[ 0.054314] TCP established hash table entries: 8192 (order: 3, 32768 bytes, linear)
[ 0.054390] TCP bind hash table entries: 8192 (order: 5, 131072 bytes, linear)
[ 0.054625] TCP: Hash tables configured (established 8192 bind 8192)
[ 0.054770] UDP hash table entries: 512 (order: 2, 16384 bytes, linear)
[ 0.054837] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes, linear)
[ 0.055053] NET: Registered PF_UNIX/PF_LOCAL protocol family
[ 0.059380] RPC: Registered named UNIX socket transport module.
[ 0.059392] RPC: Registered udp transport module.
[ 0.059397] RPC: Registered tcp transport module.
[ 0.059401] RPC: Registered tcp-with-tls transport module.
[ 0.059405] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 0.059422] PCI: CLS 0 bytes, default 64
[ 0.060611] hw perfevents: enabled with armv7_cortex_a9 PMU driver, 7 counters available
[ 0.062115] workingset: timestamp_bits=30 max_order=18 bucket_order=0
[ 0.062787] NFS: Registering the id_resolver key type
[ 0.062866] Key type id_resolver registered
[ 0.062871] Key type id_legacy registered
[ 0.063236] ntfs: driver 2.1.32 [Flags: R/W].
[ 0.063265] jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc.
[ 0.063755] bounce: pool size: 64 pages
[ 0.063791] io scheduler mq-deadline registered
[ 0.063798] io scheduler kyber registered
[ 0.063824] io scheduler bfq registered
[ 0.069021] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[ 0.071605] ffc02000.serial: ttyS1 at MMIO 0xffc02000 (irq = 30, base_baud = 6250000) is a 16550A
[ 0.072729] ffc02100.serial: ttyS0 at MMIO 0xffc02100 (irq = 31, base_baud = 6250000) is a 16550A
[ 0.072780] printk: console [ttyS0] enabled
[ 0.738034] ff251000.serial: ttyAL0 at MMIO 0xff251000 (irq = 32, base_baud = 12500000) is a Altera UART
[ 0.750168] brd: module loaded
[ 0.758992] loop: module loaded
[ 0.764253] spi_altera ff250000.spi: regoff 0, irq 33
[ 0.770775] spi_altera ff250020.spi: regoff 0, irq 34
[ 0.776592] spi_altera ff250040.spi: regoff 0, irq 35
[ 0.784886] CAN device driver interface
[ 0.789110] socfpga-dwmac ff802000.ethernet: IRQ eth_wake_irq not found
[ 0.798917] socfpga-dwmac ff802000.ethernet: IRQ eth_lpi not found
[ 0.805159] socfpga-dwmac ff802000.ethernet: Deprecated MDIO bus assumption used
[ 0.813139] socfpga-dwmac ff802000.ethernet: User ID: 0x10, Synopsys ID: 0x37
[ 0.823158] socfpga-dwmac ff802000.ethernet: DWMAC1000
[ 0.828369] socfpga-dwmac ff802000.ethernet: DMA HW capability register supported
[ 0.835849] socfpga-dwmac ff802000.ethernet: RX Checksum Offload Engine supported
[ 0.843322] socfpga-dwmac ff802000.ethernet: COE Type 2
[ 0.848535] socfpga-dwmac ff802000.ethernet: TX Checksum insertion supported
[ 0.855562] socfpga-dwmac ff802000.ethernet: Enhanced/Alternate descriptors
[ 0.862503] socfpga-dwmac ff802000.ethernet: Enabled extended descriptors
[ 0.869270] socfpga-dwmac ff802000.ethernet: Ring mode enabled
[ 0.884449] Micrel KSZ9031 Gigabit PHY stmmac-0:07: attached PHY driver (mii_bus:phy_addr=stmmac-0:07, irq=POLL)
[ 0.898358] loaded device eth0, tx_queue_len 1000, features: 71468255824179
[ 0.899428] usbcore: registered new interface driver usb-storage
[ 0.912595] i2c_dev: i2c /dev entries driver
[ 0.920654] EDAC DEVICE0: Giving out device to module l2-ecc controller Altera ECC Manager: DEV l2-ecc (INTERRUPT)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for sharing your solution, Silvan!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I’m glad that your question has been addressed, I now transition this thread to community support. If you have a new question, Please login to ‘https://supporttickets.intel.com/s/?language=en_US’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page