Embedded Intel Atom® Processors
Technological Conversations about Intel Atom® Hardware, Software, Firmware, Graphics
1184 Discussions

EMGD hard locking system occasionally on X server shutdown

GMcCo1
Beginner
2,604 Views

My hardware is a VL-EPM-24EU (VersaLogic board w/ Z520PT). I've been experiencing system hard locking on X server shutdown with the EMGD. I'm using a custom embedded Linux running 2.6.33.9-rt31 but I've also reproduced the problem on a stock installation of Fedora 14. Initially I was using EMGD 1.5.15 but I upgraded to 1.6.0.1922 hoping the problem would go away (it didn't). The system hard locks on X server shutdown roughly 1 out of 200 times when running Fedora 14. When running my custom distro with PREEMPT_RT turned on the problems seems to happen roughly 10 times more frequently.

I enabled debug prints in emgd-dkms and redirected console output to a serial port and I found the hardlock always happens when emgd-dkms is in the "Restore the GTT entries" for loop in reg_restore_gtt_plb().

 

I have a script which automates replication of the problem and I've replicated it at least 5 times on Fedora 14 and probably 50 on my custom Linux distro. On my custom Linux distro I tried enabling the NMI watchdog as well as sending SysRq commands as far as I can tell the system is completely dead at this point.

I even went as far as disabling interrupts with raw_local_irq_save/restore around the Restore GTT entries for loop. This didn't seem to help at all.

Here is the console output from one of the occasions that the problem occurred on Fedora 14:

[ 1545.896729] [EMGD_DEBUG] emgd_driver_lastclose ENTER[ 1545.901739] [EMGD_DEBUG] emgd_driver_lastclose Need to restore the console's saved register state[ 1545.910665] [EMGD_DEBUG] reg_alloc_plb Entry - reg_alloc[ 1545.916381] [EMGD_DEBUG] reg_save_plb ENTER[ 1545.920682] [EMGD_DEBUG] reg_save_plb Saving VGA registers[ 1545.926252] [EMGD_DEBUG] reg_save_vga_plb 0x3c4 (0x4): 0x6[ 1545.931800] [EMGD_DEBUG] reg_save_vga_plb 0x3ce (0x5): 0x0[ 1545.937347] [EMGD_DEBUG] reg_save_vga_plb 0x3ce (0x6): 0x5[ 1545.942891] [EMGD_DEBUG] reg_save_vga_plb 0x3ce (0x4): 0x0[ 1545.948438] [EMGD_DEBUG] reg_save_vga_plb 0x3ce (0x4): 0x1[ 1545.953992] [EMGD_DEBUG] reg_save_vga_plb 0x3ce (0x4): 0x2[ 1545.959539] [EMGD_DEBUG] reg_save_vga_plb 0x3ce (0x4): 0x3[ 1545.965109] [EMGD_DEBUG] reg_save_vga_plb 0x3ce (0x6): 0x5[ 1545.970661] [EMGD_DEBUG] reg_save_vga_plb 0x3ce (0x5): 0x10[ 1545.976300] [EMGD_DEBUG] reg_save_vga_plb 0x3ce (0x4): 0x0[ 1545.981846] [EMGD_DEBUG] reg_save_vga_plb 0x3c4 (0x4): 0x2[ 1545.987487] [EMGD_DEBUG] reg_save_plb Saving MMIO registers[ 1546.015689] [EMGD_DEBUG] reg_save_plb Saving DAC registers[ 1546.021434] [EMGD_DEBUG] reg_save_plb Saving mode state[ 1546.026728] [EMGD_DEBUG] mode_save ENTER[ 1546.030722] [EMGD_DEBUG] mode_save saving Internal LVDS Port Driver[ 1546.037117] [EMGD_DEBUG] mode_save mode_save: saved 1 port driver states.[ 1546.043963] [EMGD_DEBUG] mode_save EXIT[ 1546.047843] [EMGD_DEBUG] reset_plane_pipe_ports_plb ENTER[ 1546.053303] [EMGD_DEBUG] disable_vga_plb ENTER[ 1546.057809] [EMGD_DEBUG] disable_vga_plb VGA Palette workaround[ 1546.063788] [EMGD_DEBUG] disable_vga_plb EXIT[ 1546.068193] lvds_set_power state = 3[ 1546.071802] lvds_set_power lvds_set_power() to state = 3[ 1546.327394] [EMGD_DEBUG] dsp_get_next_plane Entry, dsp_get_next_plane[ 1546.333900] [EMGD_DEBUG] dsp_get_next_plane Entry, dsp_get_next_plane[ 1546.340419] [EMGD_DEBUG] dsp_get_next_plane Entry, dsp_get_next_plane[ 1546.346921] [EMGD_DEBUG] dsp_get_next_plane Entry, dsp_get_next_plane[ 1546.353443] [EMGD_DEBUG] dsp_get_next_plane Entry, dsp_get_next_plane[ 1546.359943] [EMGD_DEBUG] dsp_get_next_plane Entry, dsp_get_next_plane[ 1546.366459] [EMGD_DEBUG] dsp_get_next_plane Entry, dsp_get_next_plane[ 1546.372956] [EMGD_DEBUG] dsp_get_next_plane Entry, dsp_get_next_plane[ 1546.379464] [EMGD_DEBUG] dsp_get_next_pipe Entry, dsp_get_next_pipe[ 1546.385789] [EMGD_DEBUG] wait_for_vblank_timeout_plb ENTER[ 1546.391331] [EMGD_DEBUG] wait_for_vblank_timeout_plb Parameters: MMIO = f7a00000, pipe_reg = 70008, time_interval = 64[ 1546.402077] [EMGD_DEBUG] wait_for_vblank_timeout_plb Pipe disabled/Off[ 1546.408660] [EMGD_DEBUG] wait_for_vblank_timeout_plb EXIT[ 1546.414096] [EMGD_DEBUG] dsp_get_next_pipe Entry, dsp_get_next_pipe[ 1546.420441] [EMGD_DEBUG] wait_for_vblank_timeout_plb ENTER[ 1546.425966] [EMGD_DEBUG] wait_for_vblank_timeout_plb Parameters: MMIO = f7a00000, pipe_reg = 71008, time_interval = 64[ 1546.436716] [EMGD_DEBUG] request_vblanks_plb ENTER[ 1546.441569] [EMGD_DEBUG] request_vblanks_plb Parameters: request_for=0x2, mmio=0xf7a00000[ 1546.449808] [EMGD_DEBUG] request_vblanks_plb Registering interrupt_handler_plb()[ 1546.458324] [EMGD_DEBUG] request_vblanks_plb Successfully registered interrupt_handler_plb()[ 1546.466890] [EMGD_DEBUG] request_vblanks_plb[ 1546.467079] [EMGD_DEBUG] interrupt_handler_plb ENTER[ 1546.467093] [EMGD_DEBUG] interrupt_handler_plb EXIT--IRQ_HANDLED<9>[ 1546.482219] EXIT[ 1546.483122] [EMGD_DEBUG] interrupt_handler_plb ENTER[ 1546.483135] [EMGD_DEBUG] interrupt_handler_plb EXIT--IRQ_HANDLED[ 1546.495287] [EMGD_DEBUG] end_request_plb[ 1546.499138] [EMGD_DEBUG] interrupt_handler_plb ENTER[ 1546.499147] [EMGD_DEBUG] end_request_plb Parameters: request_for=0x2, mmio=0xf7a00000[ 1546.499166] [EMGD_DEBUG] end_request_plb Unregistering interrupt_handler_plb()<9>[ 1546.500128] ENTER[ 1546.500128] [EMGD_DEBUG] interrupt_handler_plb EXIT--IRQ_NONE[ 1546.527061] [EMGD_DEBUG] interrupt_handler_plb ENTER[ 1546.527259] [EMGD_DEBUG] interrupt_handler_plb EXIT--IRQ_NONE[ 1546.537831] [EMGD_DEBUG] end_request_plb Successfully unregistered interrupt_handler_plb()[ 1546.546152] [EMGD_DEBUG] end_request_plb EXIT[ 1546.550573] [EMGD_DEBUG] wait_for_vblank_timeout_plb EXIT[ 1546.556009] [EMGD_DEBUG] dsp_get_next_pipe Entry, dsp_get_next_pipe[ 1546.562356] [EMGD_DEBUG] reset_plane_...
0 Kudos
6 Replies
GMcCo1
Beginner
825 Views

I just upgraded my custom distro to use EMGD 1.10.0.2209 and the problem is still occuring.

As a test I rebuilt my kernel with some different settings.

I changed:

preemption from RT to Voluntary (CONFIG_PREEMPT_RT to CONFIG_PREEMPT_VOLUNTARY)

I also disabled PREEMPT_SOFTIRQS and PREEMPT_HARDIRQS.

With this kernel the X server has started and stopped over 500 times without issue.

I can't use the kernel in this state but its interesting it seems to fix the problems. I'll run it over night and see if it ever crashes...

0 Kudos
Kirk_B_Intel
Employee
825 Views

To begin with, Happy New Year - I hope we can figure out your issue to assure that it is happy.

All known issues (at the time of shipping the release) are documented in the Errata that should be available with the release (where you downloaded the release). I am not aware of such an issue being reported. With the nature of your issue, this is the sort of thing that would get a lot of attention so that makes me think it is unique to your situation.

We are not intimately familiar with the board you are using. It might be interesting for you to try to reproduce this issue on one of our Customer Reference Boards (Crown Beach) or maybe another board known to be implemented like the CRB (In addition to the Crown Beach CRB, we use Portwell NANO-8044 and 8045 in our development and validation). This would help us identfy any deviations from our recommnedations your board vendor may have made.

It is "interesting" that your issue seems to be related to resetting the graphics memory (GTTs) and with interrupts. It is almost like an interrupt is firing off while we are reassigning memory that was in use by the graphics driver which could cause very undesireable results. Have you verified your BIOS settings for the graphics aperature and stolen memory?

I am also a bit concerned with the kernel version of your distro as that does not match what we test against. The 1.10 release is checked against:

- MeeGo* IVI 1.2, kernel version 2.6.37, Xorg 1.9.X, Libva 1.0.12, Mesa 7.9 -and-

 

- Fedora 14* (Timesys Fedora Remix), kernel version 2.6.35, Xorg 1.9, Libva 1.0.12, Mesa 7.9

Generally earlier kernels will work fine with code for a little later kernel versions, but sometimes you do get weird problems like this one.

I would also like to verifty that when you checked FC14 that you did use the Timesys remix version that is supposed to be available following the steps in the EMGD User's Guide.

I am hoping we can figure out the root cause in order to concentrate on a solution for you. If it is something "odd" in the hardware, we may need to work with the board developer to figure out a solution. If it is the distro- it may be an easy port tweak. If the driver has an issue, then we need to be able to describe how our developers can see the issue to fix it.

GMcCo1
Beginner
825 Views

Kirk,

I wasn't using the Timesys Fedora Remix rather just the vanilla Fedora-14-i386. I'm currently downloading it and I'll try it out. I'm also going to try to get my hands on a NANO-8045. The BIOS on the VersaLogic board has an option for "Video Frame Buffer Size", we have this set to 8MB. There is also an option for "IGD MSI" which is disabled. I have the Period SMI disabled. They have something else in the BIOS called Firebase Technology, I'll try disabling as well.

I did quite a lot of debug on the emgd-dkms driver yesterday. It doesn't seem to disable interrupts when doing reg_restore_plb, I was seeing many interrupts being serviced by SystemISRWrapper. I wrote a kprobe module that attaches to some key kernel functions such as do_IRQ, irq_enter, irq_exit, wake_up_idle_cpu, resched_task and early_fault and sends out a trace code over GPIO. I have the GPIO hooked up to a logic analyzer. I found that many times the final trace code sent out was the code sent in reg_restore_gtt_plb. It does seem like it is getting an interrupt in the middle of doing something sensative. Strangely the problem seems to happen much more frequently when I boot the kernel with maxcpus=1.

I'm not sure it'll be of any insight, but I've attached my xorg.conf ust incase something stands out.

George

0 Kudos
Kirk_B_Intel
Employee
825 Views

Some other things to check:

Try a bigger aperature. 8MB is generally too small to do anything- 256MB is minimum recommended.

Is there something running when you shut down X? Is there something using the Overlay plane like a video decode, or a 3D app that is page flipping?

Can you try IEGD 10.x instead of EMGD? The architecture of the driver is different between IEGD and EMGD and there may be a clue there as to what is going on.

The Linux kernel version is unlikely to be the issue, but at least if you try it, we will be on the same page.

Have not looked at the XORG.CONF yet (that is on my TO DO list).

Hope this helps.

0 Kudos
GMcCo1
Beginner
825 Views

The only options for "Video Frame Buffer Size" in the VersaLogic VL-EPM-24 (Tiger) BIOS are 1MB, 4MB and 8MB.

I ordered a NANO-8045, it should be here tomorrow.

I install the Timesys Fedora Remix on the VersaLogic VL-EPM-24 and I'm having interesting results. Following the instructions in the EMGD UserGuide I did a minimal install then installed enough packages off of the DVD to build the emgd kernel module and get a simple gnome session to come up (desktop and panel). I then ssh into the machine and run this script:

# !/bin/bash for a in `seq 400`do echo "Pass $a" startx & sleep 15 killall X sleep 5done

It ran about 50 passes before it hard locked, but this time I was watching it and it looks like it leaked itself dry of memory. From now on I'm going to monitor the memory usage during all of the tests. It looks like I might be chasing several problems here.

From /proc/vmallocinfo:

Pass 1:

...

0xf84f4000-0xf89f5000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xf9380000-0xf9401000 528384 os_map_io_to_mem_nocache+0xd/0xf [emgd] phys=dfd80000 ioremap

0xf9904000-0xf9e05000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xf9e06000-0xfa307000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xfe800000-0xfec00000 4194304 pcpu_get_vm_areas+0x0/0x47c vmalloc

Pass 2:

...

0xf84f4000-0xf89f5000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xf89f6000-0xf8ef7000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xf8f00000-0xf8f81000 528384 os_map_io_to_mem_nocache+0xd/0xf [emgd] phys=dfd80000 ioremap

0xf9986000-0xf9e87000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xf9e88000-0xfa389000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xfe800000-0xfec00000 4194304 pcpu_get_vm_areas+0x0/0x47c vmalloc

Pass 3:

...

0xf84f4000-0xf89f5000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xf89f6000-0xf8ef7000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xf8ef8000-0xf93f9000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xf9dfe000-0xfa2ff000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xfa3c0000-0xfa3e1000 135168 config_plb+0x318/0x3bc [emgd] phys=dfd60000 ioremap

0xfa400000-0xfa481000 528384 os_map_io_to_mem_nocache+0xd/0xf [emgd] phys=dfd80000 ioremap

0xfa482000-0xfa983000 5246976 gmm_map+0x96/0xb5 [emgd] vmap

0xfe800000-0xfec00000 4194304 pcpu_get_vm_areas+0x0/0x47c vmalloc

I seem to get a new "5246976 gmm_map+0x96/0xb5 [emgd] vmap" entry each pass...

I don't think IEGD will work with this distro maybe I can find something to run it on.

I'll post back again when I know more.

0 Kudos
GMcCo1
Beginner
825 Views

I have my custom distro running on the VersaLogc board and Portwell NANO 8045 and I'm see the same behavior on both. When my kernel is configured with CONFIG_PREEMPT=y && CONFIG_PREEMPT_RT=y and I boot the kernel with maxcpus=1 the system dies while in the "Restore the GTT entries" loop on the first or second shutdown of the X server (Just logging in via gdm and logging out is usually enough to kill it). When I set my kernel to CONFIG_PREEMPT_VOLUNTARY=y (which is how the Timesys Fedora Remix kernel is configured) the problem all but goes away (its hard to rule out other causes when then X server has been started and stopped 400+ times).

Has anyone run EMGD on top of a Linux kernel configured with CONFIG_PREEMPT=y? Do you notice stability problems when starting and stopping the X server? Have you tried booting your kernel with maxcpus=1, does this effect the stability?

Would it be informative if I were to rebuild the Timesys Fedora Remix kernel with everything configured the same except with CONFIG_PREMPT instead CONFIG_PREEMPT_VOLUNTARY then re-evaluate stability?

If everyone is testing with kernels with CONFIG_PREEMPT_VOLUNTARY=y the problem might never appear.

0 Kudos
Reply