Community
cancel
Showing results for 
Search instead for 
Did you mean: 
DMart12
Beginner
1,987 Views

i7-6500U computer crash/random segfault involving video and cpu frequency changes

Hi,

I'm experiencing weird crashes (screen freeze/cpu soft lockup/random app segfaults) on linux, mainly when the workload involves video decoding (my best reproducer at the moment is playing two videos with `mpv` in parallel, these crash in about 10 minutes and often bring the whole computer down with them)

This is not just a graphics/i915 bug despite what I first thought: playing the videos with -vo null (no video output: no opengl/graphics involved) crashes mpv as well.This seems however to have less chances to kill everything else, but at this point it might be down to luck.

I've also experienced crashes with a single video when multitasking, or just with firefox. Others using the same computer have reported the problem here: https://forums.puri.sm/t/is-anyone-else-experiencing-freezing-issues-with-librem-15-v3/1233

I think this is related to cpu frequency changes, because setting the cpu governor to performance works around the issue perfectly: I've never been able to crash when this setting is on. I've also had a stable usage with only one cpu (offlining the other 3 cores) even with the ondemand governor.

For what it's worth, the "BIOS" is coreboot. It should not be "locking" anything, so linux is free to activate features as it finds them.

Basic checks:

- I have run the Intel® Processor Diagnostic Tool (64-bit), which passed (I ran it multiple times to be sure)

- I have run memtest86, because random crashes can be due to faulty ram, which did not find any defect in 10 hours (had time for multiple passes as well)

- I have attached the output of the Intel® System Support Utility script, please note that the kernel there is old but I have reproduced the behavior with multiple kernels: 4.4.88, fedora 25's 4.8.6-300.fc25.x86_64, debian's 4.12.0-2-amd64, upstream 4.14.0-rc2

- I should have the latest available microcode (20170707 release, /sys/devices/system/cpu/cpu0/microcode/version tells me 0xba)

I am honestly out of idea on what to try next. For starters at least my computer is useable if I restrict myself to performance governor when plugged in / 1 core when on battery, but this is not a proper solution and I'd like to understand what's happening.

I'm obviously willing to test more things or help futher diagnose the issue if possible, guidance is welcome though!

Some more info,

mpv version (debian testing's):

```

 

mpv 0.26.0 (C) 2000-2017 mpv/MPlayer/mplayer2 projects

 

built on UNKNOWN

 

ffmpeg library versions:

 

libavutil 55.58.100

 

libavcodec 57.89.100

 

libavformat 57.71.100

 

libswscale 4.6.100

 

libavfilter 6.82.100

 

libswresample 2.7.100

 

ffmpeg version: 3.3.4-1

```

Example of crash, logs from this morning:

```

Sep 29 08:03:58 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000246

 

Sep 29 08:03:58 kernel: IP: __list_del_entry_valid+0x29/0x90

 

Sep 29 08:03:58 kernel: PGD 0 P4D 0

 

Sep 29 08:03:58 kernel: Oops: 0000 [# 1] SMP

 

Sep 29 08:03:58 kernel: Modules linked in: ctr ccm fuse cpufreq_powersave cpufreq_userspace cpufreq_conservative snd_hda_codec_hdmi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt arc4 nf_conntrack_ipv6 ath9k nf_defrag_ipv6 ath9k_common ipt_REJECT nf_reject_ipv4 ath9k_hw nf_log_ipv4 nf_log_common xt_LOG xt_recent ath xt_limit xt_tcpudp snd_soc_skl mac80211 xt_addrtype snd_soc_skl_ipc snd_hda_codec_realtek snd_hda_codec_generic snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_sst_match snd_soc_core intel_rapl snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_codec coretemp kvm_intel snd_hwdep snd_hda_core kvm cfg80211 snd_pcm snd_timer irqbypass snd intel_cstate intel_uncore joydev intel_rapl_perf pcspkr serio_raw sg iTCO_wdt iTCO_vendor_support soundcore rfkill nf_conntrack_ipv4 nf_defrag_ipv4

 

Sep 29 08:03:58 kernel: xt_conntrack shpchp intel_pch_thermal battery ac topstar_laptop sparse_keymap processor_thermal_device evdev intel_soc_dts_iosf int340x_thermal_zone ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack libcrc32c crc32c_generic parport_pc ppdev lp parport iptable_filter ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_compress xxhash raid6_pq algif_skcipher af_alg dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel i915 ghash_clmulni_intel pcbc video i2c_algo_bit drm_kms_helper i2c_i801 psmouse aesni_intel prime_numbers ahci xhci_pci aes_x86_64 crypto_simd cryptd glue_helper libahci nvme xhci_hcd drm libata nvme_core usbcore scsi_mod button

 

Sep 29 08:03:58 kernel: CPU: 1 PID: 5781 Comm: mpv/ao Tainted: G W 4.14.0-rc2 # 14

 

Sep 29 08:03:58 kernel: Hardware name: Purism Librem 15 v3/Librem 15 v3, BIOS 4.6-a86d1b-Purism-5 07/27/2017

 

Sep 29 08:03:58 kernel: task: ffff924822b20040 task.stack: ffffa4fa83880000

 

Sep 29 08:03:58 kernel: RIP: 0010:__list_del_entry_valid+0x29/0x90

 

Sep 29 08:03:58 kernel: RSP: 0018:ffffa4fa83883cb0 EFLAGS: 00010203

 

Sep 29 08:03:58 kernel: RAX: 0000000000000000 RBX: ffffa4fa837fbd58 RCX: dead000000000200

 

Sep 29 08:03:58 kernel: RDX: 0000000000000246 RSI: ffffa4fa80d88448 RDI: ffffa4fa837fbd60

 

Sep 29 08:03:58 kernel: RBP: ffffa4fa83883cb0 R08: ffffa4fa837fbdb8 R09: ffffa4fa80d88448

 

Sep 29 08:03:58 kernel: R10: 0000000000000001 R11: 000000007fffffff R12: ffffa4fa837fbd60

 

Sep 29 08:03:58 kernel: R13: ffffa4fa837fbdd0 R14: ffffa4fa837fbdc0 R15: ffffa4fa80d88448

 

Sep 29 08:03:58 kernel: FS: 00007f54175c0700(0000) GS:ffff92483ec80000(0000) knlGS:0000000000000000

 

Sep 29 08:03:58 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

 

Sep 29 08:03:58 kernel: CR2: 0000000000000246 CR3: 000000026c196004 CR4: 00000000003606e0

 

Sep 29 08:03:58 kernel: Call Trace:

 

Sep 29 08:03:58 kernel: plist_del+0x3b/0xc0

 

Sep 29 08:03:58 kernel: __unqueue_futex+0x2f/0x40

 

Sep 29 08:03:58 kernel: mark_wake_futex+0x3d/0x50

 

Sep 29 08:03:58 kernel: futex_requeue+0x8a9/0xa40

 

Sep 29 08:03:58 kernel: do_futex+0x2ae/0xb10

 

Sep 29 08:03:58 kernel: SyS_futex+0x13b/0x180

 

Sep 29 08:03:58 kernel: ? SyS_write+0x79/0xc0

 

Sep 29 08:03:58 kernel: entry_SYSCALL_64_fastpath+0x1e/0xa9

 

Sep 29 08:03:58 kernel: RIP: 0033:0x7f5454a2d91d

 

Sep 29 08:03:58 kernel: RSP: 002b:00007f54175bf8e8 EFLAGS: 00000283 ORIG_RAX: 00000000000000ca

 

Sep 29 08:03:58 kernel: RAX: ffffffffffffffda RBX: 0000560690f967a0 RCX: 00007f5454a2d91d

 

Sep 29 08:03:58 kernel: RDX: 0000000000000001 RSI: 0000000000000084 RDI: 0000560690847fbc

 

Sep 29 08:03:58 kernel: RBP: 0000560690f96938 R08: 0000560690847f90 R09: 000000000001a394

 

Sep 29 08:03:58 kernel: R10: 000000007fffffff R11: 0000000000000283 R12: 0000000000000e50

 

Sep 29 08:03:58 kernel: R13: 0000560690f95a78 R14: 0000560690f95a70 R15: 0000560690f625c0

 

Sep 29 08:03:58 kernel: Code: 00 00 55 48 8b 07 48 b9 00 01 00 00 00 00 ad de 48 8b 57 08 48 89 e5 48 39 c8 74 27 48 b9 00 02 00 00 00 00 ad de 48 39 ca 74 2c <48> 8b 32 48 39 fe 75 35 48 8b 50 08 48 39 f2 75 40 b8 01 00 00

 

Sep 29 08:03:58 kernel: RIP: __list_del_entry_valid+0x29/0x90 RSP: ffffa4fa83883cb0

XXXWHITESPA...

0 Kudos
14 Replies
idata
Community Manager
250 Views

Hello Asmadeus

 

Thank you for using the Intel(R) Communities.

I understand you are facing system crashes while having two videos playing and some other scenarios.

 

In this case, it would be recommended to have this inquiry handled by the Linux*/Graphics support team to have their expertise handling your problem.

This would be the place to check support for this matter: https://01.org/linuxgraphics/support https://01.org/linuxgraphics/support

 

Thank you,

 

Esteban C
DMart12
Beginner
250 Views

Hi Esteban,

That was my first thought as well, but since I was able to reproduce with playing without any output, I do not believe that video is involved ; `mpv -vo null` really only just reads the file, decodes it and throws the output away. There is no graphics acceleration, no openGL... The buffer is just not dispalyed.

I do not think it is their time to shine here

Decoding videos (in this case h264) is a very complex operation, mpv uses ffmpeg which has been optimizing the process a lot.

Part of the code is written in assembly with sse vectorial instructions and things that have been known to trigger "complex loads" which have led to crashes in the past (cf. the prime95 freeze that is very famous)

Without help I will keep trying to minimize the reproducer, I'll try to take the code out of ffmpeg and run it in a loop maybe, but this is a lot of work. I'm especially perplexed by the seemingly relation to cpu frequency changes requirement here.

Thank you,

--

Dominique Martinet | Asmadeus

DMart12
Beginner
250 Views

By the way, I said two videos but it actually depends on the actual media being played, basically adjusting to make the load big enough to force the cpu frequency to increase a bit but small enough to have it reduce back down "often", ideally getting it to change back and forth from ~1ishGHz to ~2.5GHz is what has been giving me best results.

To give concrete examples, using a slighly more intensive video (60fps@1080p, for example http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_60fps_normal.mp4 http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_60fps_normal.mp4 ), I can get it to crash with just one player.

With a smaller one (e.g. older 24fps@720p http://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_h264.mov http://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_h264.mov ), I actually needed to start three instances of mpv to get the CPU frequency to oscillate and crash "quickly".

These files are pretty famous and have been playing fine for over half a day/whole night with either 3/4 CPUs turned offline (writting 0 to `/sys/devices/system/cpu/cpu[1-3]/online`) or with `cpufreq-set -g performance`, so I am confident the both the media and this version of mpv/ffmpeg are fine by themselves.

It is only when using multiple cores at varying frequencies that I get frequent crashes.

Thank you,

--

Dominique Martinet

idata
Community Manager
250 Views

Thank you for the answer provided.

 

 

Please proceed to place your question, with that additional information at the Linux/Intel(R) support forums: https://01.org/linuxgraphics/support https://01.org/linuxgraphics/support

 

 

 

Thanks,

 

Esteban C
DMart12
Beginner
250 Views

Once again, that link is for intel *graphics* component, wheras I do not use the graphics driver at all. I could blacklist all drm/i915 modules and still reproduce crashes. (hm, I need to actually try, ok. I will report back tonight)

If you have an actual overall linux support I would not mind switching, but the linuxgraphics support "forum" has nothing to do with the CPU itself. They will just laugh at me if I complain about a non-graphics issue there.

What do you actually need to take this report seriously? I'd honestly rather not have to install windows on this laptop, I guess there are trial versions that do not need a license but it is as much a matter of principle...

Thanks,

--

Dominique Martinet

DMart12
Beginner
250 Views

> I could blacklist all drm/i915 modules and still reproduce crashes

I can confirm this part, I removed all graphics kernel modules (drm.ko, drm_kms_helper.ko and i915.ko) ; rebuilt initrd ; rebooted in single mode (X wouldn't start anymore) and reproduced just fine.

There is no graphics involved in this bug. It is about h264 decoding instructions and CPU frequency changes.

Thanks,

--

Dominique Martinet

idata
Community Manager
250 Views

Hello Asmadeus,

 

 

 

Thank you for the answers provided.

 

 

I have proceeded to perform a test with the same configuration you have but within a Windows 10* environment.

 

 

These would be the results:

 

 

 

Hardware used:
  • Surface Pro 4
  • CPU: Intel® Core™ i5-6300U Processor
  • GPU: Intel(R) HD Graphics 520
  • Graphics driver: 22.20.16.4771 (latest available at this moment 10/04/2017)
  • 8GB of RAM
  • Windowed and fullscreen

Tests performed:

  • Downloaded same player used (mpv) from https://mpv.io/ https://mpv.io/
  • Downloaded video from the link provided above by you
  • Started video playback
  • Monitored CPU workload or % of use (was normal, no more than 20%)
  • Checked if there were graphics issues or lag (not present )
  • The system did not crash

At this point I would like to check, are there certain video configurations within the player that have been changed? I can certainly reproduce them to see if the problem happens with the official driver from our site.

Some screenshots have been attached.

 

Thanks,

 

Esteban C
DMart12
Beginner
250 Views

Hello,

Thank you very much for giving it a try.

I have not changed any option with mpv, I believe that even for windows it should use ffmpeg and similar acceleration instructions (sse or similar).

What I did notice as being important to reproduce, though, is that the CPU frequently changes frequency during the playback. I am not sure how to check under windows but linux has an interface file called `/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq` (one per core) that gives the current CPU frequency, and basically I could only get crashes if the frequency would be alternating regularly between something along 1GHz to 2.8+GHz. I believe windows should exhibit similar frequency swings if the laptop is configured in power saving mode (it should actually be done in hardware, through dynamic voltage & frequency scaling (dvfs)), but you might need slightly more usage than 20% if that is what you were observing - have you tried opening the video multiple times in parallel?

The CPU model is also different, I am not sure how specific this issue might be. I have other laptops/NUCs with intel CPUs and have never experienced such an issue.

For information, I have asked the laptop makers to see if they can reproduce the issue on a wider range of models to check if this would be a bad series as well (there are only a handful of reports at this point). I will report when I hear back from them.

I am also starting to think it could be related to the mother board itself, for example if the CPU input voltage is not steady enough - I believe sudden frequency increase combined with power hungry instructions could also cause unwanted voltage fluctuations, which might this kind of behavior. I do not know how to confirm or infirm that though. I will try to look with an oscilloscope towards the end of the month if I can find a suitable pin to probe close enough to the CPU (do not hold your breath)

Thanks,

--

Dominique Martinet | Asmadeus

idata
Community Manager
250 Views

Thank you for the reply, Asmadeus

 

Would you please clarify what do you mean by: "I believe that even for windows it should use ffmpeg and similar acceleration instructions (sse or similar)" Is that something you believe should be added or is it something that is used by mpv already?

 

 

 

About the CPU frequency changes

 

 

 

The frequency faced during the tests performed (new tests) did change, but that is a regular behavior, depending on the tasks performed by the CPU, the frequency can change.

 

 

 

These frequency changes happened, and no crashes or performance problems were faced in a windows 10 environment.

 

 

 

Related to the power plan

 

 

The system used has Balanced mode only and this is intended for the Surface Pro 4*.
  • Have you tested the system with different power plans? If available

Three instances of the video were played while the tests were performed, with mpv and with VLC players.

 

 

 

About the CPU model used

 

 

Are you confirming that other systems with the same OS configuration do not face the problem?

 

 

It is great to hear you have reached the manufacturer of the system to get this tested. Please do report back when results are present.

 

 

There could be a possibility of the motherboard affecting in that way you mentioned (CPU voltage management)

 

 

Screenshots attached:

 

Thanks,

 

Esteban C
DMart12
Beginner
250 Views

Hello,

replying in order:

mpv and ffmpeg/sse instructions: It's something mpv/vlc already should do by default so I do not think anything will change.

Frequency changes: ok. frequency changes do appear to work here, it just looks like it needs a lof of them to exhibit crashes.

Power plans: I'm not sure if the question was for me, but using a "performance" power plan (CPU always stays close to the maximum frequency) I have not experienced any crash, which is why I believe these "frequency swings" are important.

CPU models: Yes, I have the same system on another skylake CPU (an intel NUC with i5-6260U) as well as another older laptop (not skylake though) running the same software with no issue.

I'll update here when I have heard back from the manufacturer.

Thanks,

--

Dominique Martinet | Asmadeus

idata
Community Manager
250 Views

Thank you for reporting back, Asmadeus

Lets see what the manufacturer says since there is the possibility where this is related to a single configuration so then we can proceed accordingly.

 

Thanks,

 

Esteban C
idata
Community Manager
250 Views

Asmadeus,

 

This is to do a follow up to your inquiry and find out if you have further questions or if the system's manufacturer has provided some details.

 

Thanks,

 

Esteban C
DMart12
Beginner
250 Views

Thanks for the follow up.

I haven't had much replies, but I'm still traveling so I can't investigate as much as I would like right now.

I'll hopefully have more details around the end of the month.

--

Dominique Martinet | Asmadeus

idata
Community Manager
250 Views

I understand, thank you for reporting back.

 

 

Feel free to reply when possible.

 

 

 

Thanks,

 

Esteban C
Reply