Community
cancel
Showing results for 
Search instead for 
Did you mean: 
jmcla3
New Contributor I
3,564 Views

ttyS0 input overruns

We recently tried the new 1.2.0 BSP for the Quark which gives us kernel version 3.14.28 instead of the old 3.8.7.

The problem is that with the new kernel we are seeing large numbers of ttyS0 input overruns. We weren't seeing this problem with 3.8.7.

For example:

Dec 15 02:22:41 SN01291 kernel: [ 640.831598] ttyS0: 1 input overrun(s)

Dec 15 02:22:43 SN01291 kernel: [ 643.228205] ttyS0: 1 input overrun(s)

Dec 15 02:22:53 SN01291 kernel: [ 653.291803] ttyS0: 2 input overrun(s)

Dec 15 02:23:19 SN01291 kernel: [ 679.421426] ttyS0: 1 input overrun(s)

Dec 15 02:23:29 SN01291 kernel: [ 689.429202] ttyS0: 2 input overrun(s)

We tend to see higher numbers of overruns when the processor is busier.

We tried to add error correction to the data being passed over the serial port, but that only helps a little bit. When we get too much data lost we can't recover.

Is there a setting we can change to adjust the FIFO trigger level of the 16550? It doesn't appear that we can make that change through setserial. Was there a change made to the serial drivers between 3.8.7 and 3.14.28? Is there another buffer that can have its size adjusted?

16 Replies
Pablo_M_Intel
Employee
730 Views

Hi dfwJones,

Aside BSP 1.0.1 which was developed thinking on the Galileo board, the following BSPs are designed to be used in a Quark environment, but not specifically for the Galileo Platform. So the BSP that you're using now may present this kind of issues when being used in this environment. We'll investigate your issue and we'll let you know when we have more updates.

Could you please tell us what processes are you running? We would like to know under what conditions you get this "overrun" messages.

Regards,

PabloM_Intel

jmcla3
New Contributor I
730 Views

The faults are happening on our own board, not on a Galileo board. Our board's design leverages the Galileo design.

The faults happen when the processor is busy. During normal operation our board has a number of things running almost all the time.

We have a sensor that streams data in bursts over the ttyS0 port. The bursts can be many per second.

The software on the Quark takes the sensor data and does limited processing before sending it over the Ethernet connection.

The board also has a USB webcam which enumerates as /dev/video0. A low frame rate video stream is sent over the Ethernet connection.

Pablo_M_Intel
Employee
730 Views

Hi dfwJones,

I think I understand your problem now, have you checked this thread before ? Another user had a similar request, and he needed to configure the UART driver and xbolshe provided a possible solution. I would suggest you to check the thread, you might find some useful information.

Regards,

PabloM_Intel

DStok1
Beginner
730 Views

Hi PabloM,

I am also seeing this same issue with the exact symptoms as dfwJones. I reviewed the link you suggested, but I don't see the relevance to the ttyS0 overrun issue. In our application, we need both the serial and Ethernet interfaces active. The behavior I'm seeing is exactly what you might see if the serial buffers were too shallow or interrupts are being disabled by some other process for too long.

Best regards,

John

asss
Valued Contributor II
730 Views

Hi,

Intel Quark processor has only one thread/one core (http://ark.intel.com/products/79084/Intel-Quark-SoC-X1000-16K-Cache-400-MHz Intel® Quark™ SoC X1000 (16K Cache, 400 MHz) Specifications).

If a heavy task does not allow to switch to the Linux driver in a time, tty overruns are expected.

Intel Quark has a FIFO buffer with 16 bytes length for operations. And there is no way to increase it.

For now an internal buffer length is 4095 bytes. It is possible to increase it.

But I guess an increased buffer will not fix a requirement to get a data from FIFO buffer.

dfwJones,

by the way, may you provide several shorts in the time of the command below when a heavy task is executed in case of 3.8.7 and 3.14.28 kernel?

Commandcat /proc/interrupts

May you provide more information about a serial port speed and actual data rate?

BR,

xbolshe

jmcla3
New Contributor I
730 Views

I've spent a lot of time digging deeper into the problem. I have three separate versions of the kernel; 3.8.7, 3.14.28, 3.19.8. The problem happens on both 3.14.28 and 3.19.8, but does not happen on 3.8.7.

When we see the problem, we get system messages like this:

[ 334.896442] ttyS0: 10 input overrun(s)

[ 336.293599] ttyS0: 13 input overrun(s)

[ 337.328057] ttyS0: 16 input overrun(s)

[ 338.591951] ttyS0: 11 input overrun(s)

[ 340.215313] ttyS0: 9 input overrun(s)

[ 341.360737] ttyS0: 14 input overrun(s)

[ 342.553417] ttyS0: 15 input overrun(s)

[ 343.600646] ttyS0: 6 input overrun(s)

As you can see, we are seeing very large numbers of overruns every second.

I dumped the contents of /proc/interrupts before and after running the tests. We are seeing very large increases in the counts in all cases. Assuming I'm reading the output correctly, it looks like the serial port is set to the same interrupt in all 3 kernels, but in the case of 3.8.7, it doesn't share the interrupt with anything else. The other two kernels appear to have multiple peripherals sharing the same interrupt.?

3.8.7:

17: 2255 IO-APIC-fasteoi serial

3.14.28:

17: 162316 IO-APIC-fasteoi dw_dmac, dw_dmac, pxa2xx-spi.1, serial

3.19.8:

17: 795 IO-APIC 17-fasteoi INTEL_MID_DMAC2, intel_quark_uart, INTEL_MID_DMAC2, intel_quark_uart, pxa2xx-spi.1

Assuming that the sharing is taking place, how do we move those other peripherals to other interrupts?

If you need the full output of /proc/interrupts, let me know and I can post it.

asss
Valued Contributor II
730 Views

Hi,

may I ask you to test how it will work with https://relvarsoft.com/galileo/galileo_xbolshe_iot_1.2.0_kernel_v3.19.8_201512301.zip this image?

It has UARTs on different interrupts:

24: 72 PCI-MSI-edgeINTEL_MID_DMAC2, intel_quark_uart25: 2319 PCI-MSI-edgeINTEL_MID_DMAC2, intel_quark_uart

And please post all output of /proc/interrupts after a heavy load.

BR,

xbolshe

jmcla3
New Contributor I
730 Views

Thank you for producing a new kernel. The kernel as you packaged it boots, but it lacks our product's environment. I tried merging the kernel with all of our environment, but it didn't go well. It looks like there are a number of devices (/sys/proc/gpio, eth0, etc.) that aren't loading which prevent our stuff from running.

Is it possible for you to tell us how you managed to move the other devices away from the interrupt that the serial port is using? That way I can make the change and rebuild the kernel here. At the moment, I think we'd prefer to try and continue using the 3.19 we are building from here:

https://github.com/xbolshe/galileo-sources/tree/master/iot_1.2.0_kernel_3.19.8 galileo-sources/iot_1.2.0_kernel_3.19.8 at master · xbolshe/galileo-sources · GitHub

asss
Valued Contributor II
730 Views

Hi dfwJones,

the repository you have mentioned above now have an update.

It is related with an UARTs interrupt separation.

I guess you may try to use it.

Now it looks like:

root@quark:~# cat /proc/interrupts

CPU0

0: 29 IO-APIC-edge timer

7: 2 IO-APIC-edge

8: 1 IO-APIC-edge rtc0

9: 2 IO-APIC-fasteoi acpi, gpio_sch

16: 91 IO-APIC 16-fasteoi pxa2xx-spi.0, ohci_hcd:usb2

17: 0 IO-APIC 17-fasteoi pxa2xx-spi.1

19: 4 IO-APIC 19-fasteoi ehci_hcd:usb1

24: 0 PCI-MSI-edge INTEL_MID_DMAC2, intel_quark_uart

25: 9098 PCI-MSI-edge INTEL_MID_DMAC2, intel_quark_uart

26: 3948 PCI-MSI-edge mmc0

35: 287 PCI-MSI-edge intel_qrk_gip

36: 1 PCI-MSI-edge pch_udc

37: 4157 PCI-MSI-edge enp0s20f6

40: 2 gsi-sch_gpio_irq 0-0020

46: 29 PCI-MSI-edge iwlwifi

100: 2 cy8c9540a-irq gpiolib

NMI: 0 Non-maskable interrupts

LOC: 16500 Local timer interrupts

SPU: 0 Spurious interrupts

PMI: 0 Performance monitoring interrupts

IWI: 1 IRQ work interrupts

RTR: 0 APIC ICR read retries

TRM: 0 Thermal event interrupts

THR: 0 Threshold APIC interrupts

MCE: 0 Machine check exceptions

MCP: 0 Machine check polls

ERR: 2

MIS: 0

BR,

xbolshe

Carlos_M_Intel
Employee
730 Views

Hi dfwJones,

Do you have updates on this?

Have you tried with the suggestion from xbolshe?

Regards,

Charlie

jmcla3
New Contributor I
730 Views

Sorry for the delays, we've encountered a few other issues with 3.19. I may start separate threads for them.

We can't yet fully test the new build. For some reason we aren't getting the /dev/video0 device to show up like it used to with the 3.14 in the official BSP. I've tried everything I can think of to enable with menuconfig.

Without the streaming video, we were still seeing the overrun errors under the original 3.19. With the new version using the interrupt fixes, we haven't yet seen any overruns. This is a very good sign so far. We will keep testing as soon as we can figure out the video0 problem.

I can't yet call it fixed, but it is looking good.

Thanks.

Andriy_S_Intel
Employee
730 Views

What kind of interrupt fixes are you talking about?

asss
Valued Contributor II
730 Views

To understand a difference just compare interrupt list for kernel 3.19.8 shown above and the original Intel BSP 1.2.0 below:

root@quark:~# cat /proc/interrupts

CPU0

0: 46 IO-APIC-edge timer

7: 1 IO-APIC-edge

8: 1 IO-APIC-edge rtc0

9: 1 IO-APIC-fasteoi acpi, gpio_sch

16: 3554 IO-APIC-fasteoi mmc0, pxa2xx-spi.0, ohci_hcd:usb2

17: 887 IO-APIC-fasteoi dw_dmac, dw_dmac, pxa2xx-spi.1, serial

19: 79 IO-APIC-fasteoi ehci_hcd:usb1

32: 1 --sch_gpio_irq_chip 0-0020

40: 7373 PCI-MSI-edge intel_qrk_gip

41: 1 PCI-MSI-edge pch_udc

42: 0 PCI-MSI-edge enp0s20f6

100: 1 cy8c9540a-irq gpiolib

NMI: 0 Non-maskable interrupts

LOC: 3370 Local timer interrupts

SPU: 0 Spurious interrupts

PMI: 0 Performance monitoring interrupts

IWI: 0 IRQ work interrupts

RTR: 0 APIC ICR read retries

TRM: 0 Thermal event interrupts

THR: 0 Threshold APIC interrupts

MCE: 0 Machine check exceptions

MCP: 0 Machine check polls

ERR: 1

MIS: 0

As you may see several devices are located on the same shared interrupt:

17: 887 IO-APIC-fasteoi dw_dmac, dw_dmac, pxa2xx-spi.1, serial

Interrupt fixes allow to separate them.

BR,

xbolshe

asss
Valued Contributor II
730 Views

BTW, I have /dev/video0 with kernel 3.19.8:

BR,

xbolshe

jmcla3
New Contributor I
730 Views

I was able to finally get the /dev/video0 device to load. It only took 10 tries at various configurations using menuconfig and rebuilding. I'm not sure which setting fixed it, but it was probably in the USB settings.

Once I got that working I was able to do more testing with the serial port and the video camera, and I haven't seen any more of the overflow errors in the log. I'm still getting some strange data over the serial port from our sensor. It is almost as if it is no longer overflowing, but is getting corrupted. We've been running the serial port at 460800 which is faster than the recommended 115200. I was really hoping the interrupt changes would fix the errata issue.

For my edification, where is the bitbake equivalent of the .config file which is generated after running menuconfig? I'd like to diff the 3.14 and 3.19 versions to see what changes have happened for the defaults in each version. It might help me figure out what specifically needs to be turned on to get the video0 device working.

asss
Valued Contributor II
730 Views

The .config file is located here:

//tmp/work/quark-poky-linux/linux-yocto-quark/3.19-r0/linux-quark-standard-build

BR,

xbolshe

Reply