Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Dennis_F_
Beginner
84 Views

Centos7 kernel oops when running

When evaluating the vtune_amplifier_xe_2015.1.0.367959 on Linux I experienced a kernel oops in the vtune kernel modules. I was trying to run the microarchitecture -> general exploration -> bandwidth test. Centos 7 x86 default install updated with all patches. Code was running on SNB machine with the vtune CLI_install installed as per manual.

(CLI_install has another issues, the RHEL/Centos kernel sources are not in /usr/src/linux, installer does not pick that up automatically)
(Manual notes that power sampler should be installed but I read that it was removed earlier, update docs?)

Any ideas besides it's open source, please submit a patch? :)

code under test

compiled as user_loop (gcc 4.8.2  -g)

int main(void)
{
        volatile unsigned long i=0;
        while(i<1000000000)
        {
                ++i;
        }
        return 0;
}

crash summary

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2015.01.04-12:44:17/vmcore  [PARTIAL DUMP]
        CPUS: 16
        DATE: Sun Jan  4 12:43:16 2015
      UPTIME: 01:49:45
LOAD AVERAGE: 0.10, 0.07, 0.06
       TASKS: 367
     RELEASE: 3.10.0-123.el7.x86_64
     VERSION: #1 SMP Mon Jun 30 12:09:22 UTC 2014
      MEMORY: 32 GB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)
         PID: 29144
     COMMAND: "user_loop"
        TASK: ffff8805fc9571c0  [THREAD_INFO: ffff8805fca2e000]
         CPU: 8
       STATE: TASK_RUNNING (PANIC)

 

log:

[ 4291.860357] PAX: PMU arbitration service v1.0.1 has been started.
[ 4292.902500] sep3_15: PMU collection driver v3.15.5 (EMON) has been loaded.
[ 4292.934677] sep3_15: Chipset support is enabled.
[ 4292.956584] sep3_15: IDT vector 0x21 will be used for handling PMU interrupts.
[ 4295.038257] vtss++ kernel module ("v1.4.4-367959 Intel(R) VTune(TM) Amplifier XE 2013") registered
[ 6584.773197] BUG: unable to handle kernel paging request at ffffc900183f2000
[ 6584.805419] IP: [<ffffffffa05adeab>] UNC_COMMON_PCI_Read_Counts+0x6b/0x1b0 [sep3_15]
[ 6584.841380] PGD 42f405067 PUD 83f403067 PMD 2aa331067 PTE 0
[ 6584.867465] Oops: 0002 [#1] SMP

 

bt
PID: 29144  TASK: ffff8805fc9571c0  CPU: 8   COMMAND: "user_loop"
 #0 [ffff8805fca2fa90] machine_kexec at ffffffff81041181
 #1 [ffff8805fca2fae8] crash_kexec at ffffffff810cf0e2
 #2 [ffff8805fca2fbb8] oops_end at ffffffff815ea548
 #3 [ffff8805fca2fbe0] no_context at ffffffff815daf63
 #4 [ffff8805fca2fc30] __bad_area_nosemaphore at ffffffff815daff9
 #5 [ffff8805fca2fc78] bad_area_nosemaphore at ffffffff815db163
 #6 [ffff8805fca2fc88] __do_page_fault at ffffffff815ed36e
 #7 [ffff8805fca2fd88] do_page_fault at ffffffff815ed58a
 #8 [ffff8805fca2fdb0] page_fault at ffffffff815e97c8
    [exception RIP: UNC_COMMON_PCI_Read_Counts+107]
    RIP: ffffffffa05adeab  RSP: ffff8805fca2fe60  RFLAGS: 00010002
    RAX: 0000000000000058  RBX: 0000000000000001  RCX: 0000000000000080
    RDX: 0000000000000001  RSI: ffffc900183f1f80  RDI: 0000000000000001
    RBP: ffff8805fca2fea8   R8: 0000000000000003   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 000000000000003f
    R13: 0000000000000040  R14: 0000000000000058  R15: ffffc900183f1f80
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #9 [ffff8805fca2feb0] PMI_Interrupt_Handler at ffffffffa05a3b14 [sep3_15]
#10 [ffff8805fca2ff50] SYS_Perfvec_Handler at ffffffffa05b0f85 [sep3_15]
    RIP: 000000000040050a  RSP: 00007fff61c5d0b0  RFLAGS: 00000206
    RAX: 0000000015d95a9f  RBX: 0000000000000000  RCX: 0000000000400520
    RDX: 00007fff61c5d1a8  RSI: 00007fff61c5d198  RDI: 0000000000000001
    RBP: 00007fff61c5d0b0   R8: 00007f15a1e68e80   R9: 0000000000000000
    R10: 00007fff61c5cf40  R11: 00007f15a1acea00  R12: 0000000000400400
    R13: 00007fff61c5d190  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000015d95a9f  CS: 0033  SS: 002b

 

0 Kudos
8 Replies
Dennis_F_
Beginner
84 Views

I meant x86_64 obviously

Peter_W_Intel
Employee
84 Views

(I would like to hear if there are similar issues from Centos 7 from others:-) )

I saw some clues from your outputs: 

> [ 6584.773197] BUG: unable to handle kernel paging request at ffffc900183f2000

>DUMPFILE: /var/crash/127.0.0.1-2015.01.04-12:44:17/vmcore  [PARTIAL DUMP]

1. Did you work on Linux* which was installed on Virtual Machine? If it was the case, please work VTune(TM) Amplifier XE on native Linux*, Linux * on VM is supported by VTune only for VMWare Fusion* 5. If you installed VTune on Linux on other VM, you can only use user-mode sampling collectors.

2. Was it possible that your system was configured with huge memory page? As I knew, there was limitation to install hook functions from device driver.

3. Did you use standard Centos 7? or standard patches?  

If you don't have above problems, My opinion is to submit a ticket to Intel Premier, with your data - maybe need your more private data for investigating.

*By the way, Power profiling has been removed from regular VTune, please get this function in VTune which is in Intel(R) System Studio XE 

Dennis_F_
Beginner
84 Views

Hi Peter, Thanks for the quick reply. This was a clean Centos7 on a physical machine, so that rules out 1&3. However 2 was spot on, as I did configure huge pages so I could test the difference in TLB walks (but none were allocated yet). Is there documentation on this limitation or are there recommendations on using it so that I can work around it? Thanks, Dennis
Peter_W_Intel
Employee
84 Views

Sorry. It seemed that configuring huge page by hugetlbfs in GRUB parameter is supported by VTune, I tried:

default_hugepagesz=1g hugepagesz=1g hugepages=4 memmap=1G$4G 

(This was not supported in old product)

Please verify if all vtune drivers are loaded: lsmod grep [sep|pax|vtsspp], then helped to collect info:

1) export AMPLXE_DEBUG=1
2) export AMPLXE_LOG_LEVEL=TRACE
3) export AMPLXE_LOG_DIR=<dir>
4) in different root console
> while true; dmesg -c >>out_dmesg; done
5) from the first console (environment variables) start collection and reproduce the problem.

Then provide
1)    the result directory
2)    log files
3)    out_dmesg

I will report this to developer with your data. (If you have no time to create a new ticket at Intel Premier)

james_B_8
Beginner
84 Views

Dear Dennis,

I had similar problems with vtune recently. The cause turned out to be that it was using an older version of the SEP3 driver with a newer version of VTune. Complete uninstall then reinstall fixed it. This might also be something you want to check.

Cheers,

James

David_A_Intel1
Employee
84 Views

Also note that build "367959" is the initial release of the 2015 version.  Update 1, build 380310, was released in Oct/Nov.  I recommend you update and use the latest release.

Bernard
Black Belt
84 Views

>>>#3 [ffff8805fca2fbe0] no_context at ffffffff815daf63>>>

It seems that third function call(quoted above) caused kernel panic or called kernel panic routines.I would start to check page fault rate around the time of crash.Maybe low physical memory scenario indirectly caused that crash?

Peter_W_Intel
Employee
84 Views

james B. wrote:

Dear Dennis,

I had similar problems with vtune recently. The cause turned out to be that it was using an older version of the SEP3 driver with a newer version of VTune. Complete uninstall then reinstall fixed it. This might also be something you want to check.

Cheers,

James

Thanks James.

Go /opt/intel/vtune_amplifier_xe_2015/sepdk/src/, run:

a. rmmod-sep3

b. build-driver

c. insmod-sep3

d. boot-script -- install

Reply