Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Tommi_T_
New Contributor I
95 Views

Kernel panic on el6

Hi,

User tries to run memory analysis and compute node immediately panics when job starts.
srun amplxe-cl -collect memory-access -knob analyze-mem-objects=true -knob analyze-openmp=true ./Elmfire-Dev

(it's a hybdir MPI/OpenMP app, resource manager is Slurm)

BUG: unable to handle kernel paging request at 000000000000100c
IP: [<ffffffffa0c898a6>] OUTPUT_Reserve_Buffer_Space+0x26/0x190 [sep4_0]
PGD 1017096067 PUD 1017095067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_cur_freq
CPU 1
Modules linked in: vtsspp(U) sep4_0(U) socperf2_0(U) pax(U) lmv(U) fld(U) mgc(U) lustre(U) lov(U) osc(U) mdc(U) fid(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) sha512_generic crc32c_intel libcfs(U) cpufreq_ondemand freq_table pcc_cpufreq rdma_ucm(U) ib_ucm(U) rdma_cm(U) iw_cm(U) configfs ib_uverbs(U) ib_umad(U) mlx5_ib(U) mlx5_core(U) mlx4_en(U) ipmi_devintf iTCO_wdt iTCO_vendor_support power_meter acpi_ipmi ipmi_si ipmi_msghandler serio_raw sg sb_edac edac_core i2c_i801 lpc_ich mfd_core hpilo hpwdt ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core ib_ipoib(U) ib_cm(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U) ib_addr(U) ib_netlink(U) ipv6 mlx4_core(U) mlx_compat(U) ext4 jbd2 mbcache sd_mod crc_t10dif hpsa(U) scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]

Pid: 4790, comm: amplxe-runss Not tainted 2.6.32-642.15.1.el6.x86_64 #1 HP ProLiant XL230a Gen9/ProLiant XL230a Gen9
RIP: 0010:[<ffffffffa0c898a6>]  [<ffffffffa0c898a6>] OUTPUT_Reserve_Buffer_Space+0x26/0x190 [sep4_0]
RSP: 0018:ffff88101968b838  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88101602e440 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000000000c0 RDI: ffff88101602e440
RBP: ffff88101968b858 R08: 0000000000000000 R09: 00000000000011d6
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00000000000000c0 R15: ffff88101968b8b8
FS:  00007ff33a385700(0000) GS:ffff880028220000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000100c CR3: 0000001017093000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process amplxe-runss (pid: 4790, threadinfo ffff881019688000, task ffff88101d5a6040)
Stack:
 ffff88101968b858 ffff88101602e458 0000000000000000 00000000000000c0
<d> ffff88101968b898 ffffffffa0c89a62 00000000000002f8 ffff88101968b8b8
<d> 0000000000000003 ffff88101968b908 00000000000012b6 ffff88101968bb39
Call Trace:
 [<ffffffffa0c89a62>] OUTPUT_Module_Fill+0x52/0x90 [sep4_0]
 [<ffffffffa0c88544>] linuxos_Load_Image_Notify_Routine+0x174/0x220 [sep4_0]
 [<ffffffffa0c886fe>] linuxos_VMA_For_Process+0x10e/0x1a0 [sep4_0]
 [<ffffffff810097cc>] ? __switch_to+0x1ac/0x340
 [<ffffffffa0c887f4>] linuxos_Enum_Modules_For_Process+0x64/0xc0 [sep4_0]
 [<ffffffffa0c888ba>] linuxos_Exit_Task_Notify+0x6a/0x70 [sep4_0]
 [<ffffffff8154f385>] notifier_call_chain+0x55/0x80
 [<ffffffff810acf2a>] __blocking_notifier_call_chain+0x5a/0x80
 [<ffffffff810acf66>] blocking_notifier_call_chain+0x16/0x20
 [<ffffffff810b0e8a>] profile_task_exit+0x1a/0x20
 [<ffffffff8108175b>] do_exit+0x2b/0x870
 [<ffffffff81081ff8>] do_group_exit+0x58/0xd0
 [<ffffffff81097e06>] get_signal_to_deliver+0x1f6/0x460
 [<ffffffff8100a285>] do_signal+0x75/0x870
 [<ffffffff810abc82>] ? hrtimer_cancel+0x22/0x30
 [<ffffffff8154b2b3>] ? do_nanosleep+0x93/0xc0
 [<ffffffff810abd54>] ? hrtimer_nanosleep+0xc4/0x180
 [<ffffffff810bd99b>] ? sys_futex+0x7b/0x170
 [<ffffffff8100ab10>] do_notify_resume+0x90/0xc0
 [<ffffffff8100b3a1>] int_signal+0x12/0x17
Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 0f 1f 44 00 00 48 8b 05 a8 16 01 00 48 89 fb 41 89 d5 <44> 8b 80 0c 10 00 00 45 85 c0 0f 85 3a 01 00 00 8b 43 1c 39 f0
RIP  [<ffffffffa0c898a6>] OUTPUT_Reserve_Buffer_Space+0x26/0x190 [sep4_0]
 RSP <ffff88101968b838>
CR2: 000000000000100c
BUG: unable to handle kernel
---[ end trace c34e52112c8c3565 ]---

 

0 Kudos
2 Replies
95 Views

Hello,

A problem with similar stack was addressed at the end of last year. Can you please check the 2017 Update 2 release of VTune Amplifier XE (build #499904)? If the problem persists - please submit Premier Support issue with details on the system HW/OS and kernel patches if any.

Regards, Katya

Woo__Julia
Beginner
95 Views

Hello Katya,

If I am running into similar issues on VTune Amplifier XE 2016 (not 2017), what are the steps to resolve this issue?

Reply