- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Hardware: Gigabyte Brix GB-BXi7-4770R
- CPU info: Intel(R) Core(TM) i7-4770R CPU @ 3.20GHz (Gigabyte Brix)
- GPU: Intel Iris Pro 5200 (integrated)
- OS: Linux CentOS 7.1
- Kernel: 3.10.0-229.el7.centos.intel.sr1.x86_64 (patched with the patch 'kernel-3.10.0-229.patch' included in 'intel-opencl-1.2-1.0-47971.tar.gz' following the instructions in 'intel-opencl-1.2-installation-external.pdf')
- Compiled the intel OpenCL samples using gcc 4.8.3
- The first sample 'CapBasic' runs without errors and generates the following output:
Number of available platforms: 1 Platform names: [0] Intel(R) OpenCL [Selected] Number of devices available for each type: CL_DEVICE_TYPE_CPU: 0 CL_DEVICE_TYPE_GPU: 1 CL_DEVICE_TYPE_ACCELERATOR: 0 *** Detailed information for each device *** CL_DEVICE_TYPE_GPU[0] CL_DEVICE_NAME: Intel(R) HD Graphics CL_DEVICE_AVAILABLE: 1 CL_DEVICE_VENDOR: Intel(R) Corporation CL_DEVICE_PROFILE: FULL_PROFILE CL_DEVICE_VERSION: OpenCL 1.2 CL_DRIVER_VERSION: 1.0.47971 CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 CL_DEVICE_MAX_COMPUTE_UNITS: 40 CL_DEVICE_MAX_CLOCK_FREQUENCY: 1300 CL_DEVICE_MAX_WORK_GROUP_SIZE: 256 CL_DEVICE_ADDRESS_BITS: 64 CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 427399577 CL_DEVICE_GLOBAL_MEM_SIZE: 1709598311 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 427399577 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 524288 CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 64 CL_DEVICE_LOCAL_MEM_SIZE: 65536 CL_DEVICE_PROFILING_TIMER_RESOLUTION: 80 CL_DEVICE_IMAGE_SUPPORT: 1 CL_DEVICE_ERROR_CORRECTION_SUPPORT: 0 CL_DEVICE_HOST_UNIFIED_MEMORY: 1 CL_DEVICE_EXTENSIONS: cl_intel_accelerator cl_intel_advanced_motion_estimation cl_intel_motion_estimation cl_intel_subgroups cl_intel_va_api_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 0 CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 0
- However when I try to run the 'GEMM' sample, it hangs after the following few lines:
Platforms (1): [0] Intel(R) OpenCL [Selected] Devices (1): [0] Intel(R) HD Graphics [Selected]
- At the same time in '/var/log/messages', I see the following:
Jan 17 21:45:40 centos71 kernel: [drm] GPU HANG: ecode 0:0x8fd0ffff, in gemm [25 56], reason: Ring hung, action: reset Jan 17 21:45:42 centos71 kernel: [drm] Enabling RC6 states: RC6 on, RC6p off, RC 6pp off Jan 17 21:45:42 centos71 kernel: ------------[ cut here ]------------ Jan 17 21:45:42 centos71 kernel: WARNING: at drivers/gpu/drm/i915/intel_pm.c:3432 gen6_enable_rps_interrupts+0xa3/0xb0 [i915]() Jan 17 21:45:42 centos71 kernel: Modules linked in: ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables snd_hda_codec_realtek intel_powerclamp snd_hda_codec_hdmi snd_hda_codec_generic coretemp kvm_intel kvm snd_hda_intel snd_hda_controller snd_hda_codec btusb crct10dif_pclmul crc32_pclmul crc32c_intel snd_hwdep bluetooth i915 ghash_clmulni_intel snd_seq aesni_intel snd_seq_device r8169 lrw gf128mul glue_helper rfkill snd_pcm ablk_helper cryptd mii i2c_algo_bit snd_timer iTCO_wdt drm_kms_helper Jan 17 21:45:42 centos71 kernel: snd iTCO_vendor_support sdhci_acpi drm soundcore sdhci mei_me mmc_core shpchp mei lpc_ich video mfd_core i2c_i801 i2c_hid i2c_core pcspkr nls_utf8 isofs loop xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ahci libahci libata dm_mirror dm_region_hash dm_log dm_mod Jan 17 21:45:42 centos71 kernel: CPU: 0 PID: 71 Comm: kworker/0:1 Not tainted 3.10.0-229.el7.centos.intel.sr1.x86_64 #1 Jan 17 21:45:42 centos71 kernel: Hardware name: GIGABYTE M4HM87P-00/M4HM87P-00, BIOS F5 06/23/2014 Jan 17 21:45:42 centos71 kernel: Workqueue: events intel_gen6_powersave_work [i915] Jan 17 21:45:42 centos71 kernel: 0000000000000000 000000005b114aaa ffff880407ecbd58 ffffffff81603f36 Jan 17 21:45:42 centos71 kernel: ffff880407ecbd90 ffffffff8106e28b ffff880406430000 ffff880406437108 Jan 17 21:45:42 centos71 kernel: 0000000000040000 ffff880406435820 ffff880406430000 ffff880407ecbda0 Jan 17 21:45:42 centos71 kernel: Call Trace: Jan 17 21:45:42 centos71 kernel: [<ffffffff81603f36>] dump_stack+0x19/0x1b Jan 17 21:45:42 centos71 kernel: [<ffffffff8106e28b>] warn_slowpath_common+0x6b/0xb0 Jan 17 21:45:42 centos71 kernel: [<ffffffff8106e3da>] warn_slowpath_null+0x1a/0x20 Jan 17 21:45:42 centos71 kernel: [<ffffffffa03e5263>] gen6_enable_rps_interrupts+0xa3/0xb0 [i915] Jan 17 21:45:42 centos71 kernel: [<ffffffffa03ea32e>] intel_gen6_powersave_work+0x39e/0xd80 [i915] Jan 17 21:45:42 centos71 kernel: [<ffffffff8108f0ab>] process_one_work+0x17b/0x470 Jan 17 21:45:42 centos71 kernel: [<ffffffff8108fe8b>] worker_thread+0x11b/0x400 Jan 17 21:45:42 centos71 kernel: [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400 Jan 17 21:45:42 centos71 kernel: [<ffffffff8109726f>] kthread+0xcf/0xe0 Jan 17 21:45:42 centos71 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Jan 17 21:45:42 centos71 kernel: [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0 Jan 17 21:45:42 centos71 kernel: [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 Jan 17 21:45:42 centos71 kernel: ---[ end trace fb836458f742c30c ]---
- I saved the GPU crash dump from '/sys/class/drm/card0/error' in case you are interested in it
- I also tried compiling the stock 4.1 Linux kernel and using the patch provided for the 4.1 kernel, but the results are similar.
Any help or idea of what's going on here is appreciated.
Thanks in advance,
Franco Venturi
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Franco,
Here is the response from our Linux developer:
GEMM on my system has always been one of those long running applications requiring the hang check to be disabled. All of our release notes talk about to some degree. Our latest in SRB1 is pretty much unchanged from previous versions. I do not see the call trace, but I suspect this will resolve the issue for them:
- For workloads that take longer than 1.5 seconds the i915 hang check
will reset the GPU, output a kernel message for logging, and clear
any pending work items. When necessary, the i915 hang check can be
disabled on demand with
$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'
Although the GPU will no longer reset when executing with hang
checks disabled, sufficiently large workloads may stall other GPU
tasks such as screen updates. These situations can be recovered from
by manually resetting the GPU with
$ sudo bash -c 'echo 1 > /sys/kernel/debug/dri/0/i915_wedged'
We also describe this in our release notes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply Robert.
I went ahead and I disabled the hangcheck and this time 'gemm' ran to completion without warnings or errors (I did notice that the screen seemed to be frozen while 'gemm' was running, but I think that is to be expected).
I apologize for not having seen that important info in the release note (I'll go ahead and read them tonight in case I missed other important information).
Thanks again,
Franco
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it possible to set hangcheck on osx? What would the command look like?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page