Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
65 Views

QPI counters aren't available on Dell r620. Q_P0_PCI_PMON_BOX_CTL=0x0

Hello 

I have aproblem that has been disscued here many times before. PCM has no access to PQI counters. 

The difference is is that in all forum posts where problem is discussded  registers return 0xffffffff. And accodingly to documentation it shoud be -1 if application can't inicialize PMU is PMU is unavailable.

So the question is can 0x0 value be an indication that QPI PMU device is disabled in BIOS or I should be looking for another reason?

Regarding Dell R620 BIOS, I couldn't find any options that would be even close to enabling PMU, performance monitoring devices, juste devices 8 and 9. Can somebody give me any leads? What shoudl I look for? 

PCM output:

ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).
ERROR: QPI LL counter programming seems not to work. Q_P1_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).
ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).
ERROR: QPI LL counter programming seems not to work. Q_P1_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).

Thanks,

Alexander

0 Kudos
21 Replies
Highlighted
Black Belt
65 Views

Hi Alexander,

Probably BIOS locks the access to QPI (PCI address space) in SMM mode.Afaik there is no possibility to access SMM space from within ring0, so  one of the theoritical possibilities is  reversing the BIOS and installing own SMM handler or wait for the new revision of the BIOS which enables sampling of the QPI perf counters.

0 Kudos
Highlighted
65 Views

Hello

Could you ellaborate what "SMM mode", "SMM space" and "SMM handler" are?

What procedure are you sugest with "reversing the BIOS and installing own SMM handler"?

I wouldn't rely on BIOS update as I already installed latest version and I has no access to this options. The next probably will not be realeased soon.

Tnanks,

Alexander

0 Kudos
Highlighted
Black Belt
65 Views

Hi,

sorry for beign not informative:)

SMM stands for System Management Mode , which is special operation mode of CPU accessed from BIOS.

>>>What procedure are you sugest with "reversing the BIOS and installing own SMM handler"?>>>

I'm not suggesting such a procedure.It was published in few sites and it is complex task which require broad knowledge of assembly , IDA disassembler and simply locating the routine(s) which are accessing PCI address space and enabling devices 8 and 9 performance measurement.Btw Intel documentation does not specify exactly which bus and offset is used to access QPI control registers for device 8 and 9.

0 Kudos
Highlighted
Black Belt
65 Views

>>>The next probably will not be realeased soon.>>>

As I was told by one of the Intel engineers BIOS vendor is not obligated to provide such a implementation in its next revision.

0 Kudos
Highlighted
65 Views

Seems I found a real cause for the error. But I can't find a reson for that behavior.

In spite of valid configuration for PCI configuration. Data cannot be properly set.

size = HalSetBusDataByOffset(PCIConfiguration, input_pcicfg_req->bus, slot.u.AsULONG,
&(input_pcicfg_req->write_value), input_pcicfg_req->reg, input_pcicfg_req->bytes);

This call returns zero. And size==0 should be considered as a error.

Input data for a call:

msr.sys: input_pcicfg_req->bus = 63
msr.sys: slot.u.AsULONG = 72
msr.sys: slot.u.bits.DeviceNumber = 8
msr.sys: slot.u.bits.FunctionNumber = 2
msr.sys: &(input_pcicfg_req->write_value) = FFFFFA8037C5D898
msr.sys: input_pcicfg_req->reg = 244
msr.sys: input_pcicfg_req->bytes = 4

0 Kudos
Highlighted
Black Belt
65 Views

I see that I was wrong kernel mode driver can access pci configuration space and write do device 8 and 9.Does the code snippet in your post belong to msr.sys driver?

0 Kudos
Highlighted
65 Views

iliyapolak wrote:

I see that I was wrong kernel mode driver can access pci configuration space and write do device 8 and 9.Does the code snippet in your post belong to msr.sys driver?

Yes. msrmain.c around line 200

0 Kudos
Highlighted
Black Belt
65 Views

Thanks Alexander!

0 Kudos
Highlighted
65 Views

I have some doubts about bus value used to read and write QPI LL PMU registers. xeon-e5-2600-uncore-guide.pdf  says nothing about a bus, only device and funcion. It even doesn't mention a prcedure to find out right bus.

So in the current execution the following value is used:

msr.sys: input_pcicfg_req->bus = 63

As far as I can see this value is taken from the procedure below. Procedure is simple but I couldn't find any documentation about CPU bus location.

Why procedure starts from bus zeor, why device 5 funcion 0 are used? Which spec define this.

int getBusFromSocket(const uint32 socket)
{
    int cur_bus = 0;
    uint32 cur_socket = 0;
    // std::cout << "socket: "<< socket << std::endl;
    while(cur_socket <= socket)
    {
        // std::cout << "reading from bus 0x"<< std::hex << cur_bus << std::dec << " ";
        PciHandleM h(0, cur_bus, 5, 0);
        uint32 cpubusno = 0;
        h.read32(0x108, &cpubusno); // CPUBUSNO register
        cur_bus = (cpubusno >> 8)& 0x0ff;
        // std::cout << "socket: "<< cur_socket<< std::hex << " cpubusno: 0x"<< std::hex << cpubusno << " "<<cur_bus<< std::dec << std::endl;
        if(socket == cur_socket)
            return cur_bus;
        ++cur_socket;
        ++cur_bus;
        if(cur_bus > 0x0ff)
           return -1;
    }
    return -1;
}

Thanks,

Alexander

0 Kudos
Highlighted
Black Belt
65 Views

Regarding the bus number and offset I mentioned in one of my posts that those values were not available in Uncore Guide.I suppose that developers of msr.sys probably had access to this information.

Windbg running in kernel mode can be used to scan pci buses and address space.Command !pci should provide an info about pci configuration space,next commands like eb,ed can write directly to pci registers.

0 Kudos
Highlighted
Employee
65 Views

Alexander,

the CPUBUSNO register location (device 5, function 0) and format are documented in https://www-ssl.intel.com/content/www/us/en/processors/xeon/xeon-e5-1600-2600-vol-2-datasheet.html

best regards,

Roman

0 Kudos
Highlighted
Black Belt
65 Views

Thanks Roman for the information.

0 Kudos
Highlighted
65 Views

Problem was related to BIOS settings.

Now it is solved.

0 Kudos
Highlighted
Black Belt
65 Views

Hi Alexander,

sorry for off topic question which is related to your other post,but have you checked with process explorer all your threads ID?

0 Kudos
Highlighted
Beginner
65 Views

Alexander Alexeev wrote:
> Problem was related to BIOS settings.
> Now it is solved.

I'm experiencing the same problem with a Dell machine; would you be kind enough to reveal what setting should be modified to expose the QPI counters ?

Thanks
Tim

0 Kudos
Highlighted
65 Views

Have you looked at http://software.intel.com/en-us/articles/bios-preventing-access-to-qpi-performance-counters ?

Or go to this forum and search for 'pcm qpi bios'.

Pat

0 Kudos
Highlighted
Black Belt
65 Views

Thanks for link  a lot of valuable information can be found in those cpu datasheets.

0 Kudos
Highlighted
65 Views

Tim Day wrote:

I'm experiencing the same problem with a Dell machine; would you be kind enough to reveal what setting should be modified to expose the QPI counters ?

I didn't find a way to enable counters. Dell support confirmed that PCI config space cannot be made accesable with current version of BIOS. They simple recomended to wait for update. 

I switched to another HW to continue development.

0 Kudos
Highlighted
Beginner
65 Views

Thanks for the response.  I do actually have a support request in with Dell on this now; their latest report was

“We made some experimental BIOS changes that allows us to see the hidden devices in the OS. As Intel does not release drivers for these devices there would be yellow bang in the device manager. Unfortunately even after that Intel’s tool complains about unrecognized CPUs. We are asking for Intel’s help to figure out what might be wrong. Appreciate your patience on this."

(yes I have pointed out the unknown device is an expected result) so I am hopeful the necessary BIOS fixes might appear at some point.  Meanwhile we have found PCM's QPI counters seem to work as expected on an older T7500 system, but of course I'd rather be getting some numbers on more current HW.

0 Kudos