- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
My computer restarts every time I try to launch a simple program:
PCM *m = PCM::getInstance(); if (m->program(PCM::DEFAULT_EVENTS, NULL) != PCM::Success) { std::cerr << "Failed to start PCM" << std::endl; exit(1); } SystemCounterState before = getSystemCounterState(); SystemCounterState after = getSystemCounterState(); std::cout << "Instructions per Clock: " << getIPC(before, after) << "\nL3 cache hit ratio: " << getL3CacheHitRatio(before, after) << "\nL2 cache hit ratio: " << getL2CacheHitRatio(before, after) << "\nWasted cycles caused by L3 misses: " << getCyclesLostDueL3CacheMisses(before, after) << "\nBytes read from DRAM: " << getBytesReadFromMC(before, after) << std::endl; m->cleanup();
I get kernel panic. The same happened after running pcm.x for a minute. OS X 10.9.5. This is the report:
Thu Jun 11 13:53:17 2015 panic(cpu 0 caller 0xffffff80010dcc1d): Kernel trap at 0xffffff7f81a97bfc, type 13=general protection, registers: CR0: 0x000000008001003b, CR2: 0x000000076d6e3000, CR3: 0x000000006e2bd01c, CR4: 0x00000000001606e0 RAX: 0x0000000000000000, RBX: 0xffffff802995af84, RCX: 0x0000000000000c8f, RDX: 0x0000013d31b9ca72 RSP: 0xffffff80e406dec0, RBP: 0xffffff80e406ded0, RSI: 0x0000013e4fd3f19f, RDI: 0xffffff802995af84 R8: 0x0000000000000001, R9: 0x00000000cccccccd, R10: 0x00000001048519a8, R11: 0x000000076d6e3b50 R12: 0xffffff8001517415, R13: 0xffffff800165a8e0, R14: 0x0000000000000000, R15: 0xffffff80015173cd RFL: 0x0000000000010046, RIP: 0xffffff7f81a97bfc, CS: 0x0000000000000008, SS: 0x0000000000000000 Fault CR2: 0x000000076d6e3000, Error code: 0x0000000000000000, Fault CPU: 0x0 Backtrace (CPU 0), Frame : Return Address 0xffffff80e4079c50 : 0xffffff8001023139 0xffffff80e4079cd0 : 0xffffff80010dcc1d 0xffffff80e4079ea0 : 0xffffff80010f4486 0xffffff80e4079ec0 : 0xffffff7f81a97bfc 0xffffff80e406ded0 : 0xffffff80010e402e 0xffffff80e406df10 : 0xffffff80010e394e 0xffffff80e406df50 : 0xffffff80010e2c96 0xffffff80e406df80 : 0xffffff80010dc05f 0xffffff80e406dfd0 : 0xffffff80010f4649 0xffffff811eb33c90 : 0xffffff80010a3bc0 0xffffff811eb33cd0 : 0xffffff800108ed72 0xffffff811eb33d50 : 0xffffff800107977e 0xffffff811eb33f20 : 0xffffff80010dd05c 0xffffff811eb33fb0 : 0xffffff80010f438b Kernel Extensions in backtrace: com.intel.driver.PcmMsr(1.0)[8E137983-87E4-37B1-8E6C-A6D8BC38C80B]@0xffffff7f81a97000->0xffffff7f81a9afff BSD process name corresponding to current thread: clion Boot args: -v Mac OS version: 13F1077 Kernel version: Darwin Kernel Version 13.4.0: Wed Mar 18 16:20:14 PDT 2015; root:xnu-2422.115.14~1/RELEASE_X86_64 Kernel UUID: 8B1A8FD1-2344-36C0-A7F5-D9D485A995FA Kernel slide: 0x0000000000e00000 Kernel text base: 0xffffff8001000000 System model name: MacBookPro11,1 (Mac-189A3D4F975D5FFC)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Danicha,
It looks like the PCM driver is panic'ing the kernel (since the pcm driver is in the backtrace). In every case that I've found when this happens, it is because PCM is trying to read an MSR which is not readable or trying to write a bit which is reserved. Unfortunately Mac OSX doesn't provide 'safe read/write MSRs' routine with exception handlers to catch any GP faults. So the kernel crashes. Every other modern OS provides these safe rd/wr msr routines. So you have to ensure that PCM is not accessing any invalid MSR. Figuring out which MSRs are allowed on every platform is a daunting task.
As a side note, I have a hack which captures the invalid rd/wrmsr but I do not have permission to make it public. it works but I don't know enough about it to know how safe or robust it is. I also am working on a script to list which MSRs are read/write-able on any platform but even given all the information to which I have access figuring out the MSR list is a still a daunting task. Perhaps given this list we could finally provide signed windows and MacOSX driver binaries for PCM. The Intel security folks do not want to provide drivers which allow reading/writing arbitrary MSRs.
If the above crash dump is indeed for reading/writing an invalid MSR then RCX shows which MSR you are accessing. In this case it is 0xc8f. This is the MSR IA32_PQR_ASSOC.
The MSR is accessed in 2 places in cpucounter.cpp.
- in PCM::initL3CacheOccupancyMonitoring() and
- PCM::freeRMID()
In 1), it looks like reads/write to 0xc8f are protected by a check:
if(!L3CacheOccupancyMetricAvailable()) { return; }
This check is not present in 2). Can you try adding the code snippet above to freeRMID() and see if the crash goes away?
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Patrick,
I added the code in the beginning of PCM::freeRMID, it worked!
Thank you!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page