Community
cancel
Showing results for 
Search instead for 
Did you mean: 
470 Views

QPI counters aren't available on Dell r620. Q_P0_PCI_PMON_BOX_CTL=0x0

Hello 

I have aproblem that has been disscued here many times before. PCM has no access to PQI counters. 

The difference is is that in all forum posts where problem is discussded  registers return 0xffffffff. And accodingly to documentation it shoud be -1 if application can't inicialize PMU is PMU is unavailable.

So the question is can 0x0 value be an indication that QPI PMU device is disabled in BIOS or I should be looking for another reason?

Regarding Dell R620 BIOS, I couldn't find any options that would be even close to enabling PMU, performance monitoring devices, juste devices 8 and 9. Can somebody give me any leads? What shoudl I look for? 

PCM output:

ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).
ERROR: QPI LL counter programming seems not to work. Q_P1_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).
ERROR: QPI LL counter programming seems not to work. Q_P0_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).
ERROR: QPI LL counter programming seems not to work. Q_P1_PCI_PMON_BOX_CTL=0x0
Please see BIOS options to enable the export of performance monitoring devices (devices 8 and 9: function 2).

Thanks,

Alexander

0 Kudos
21 Replies
Bernard
Black Belt
439 Views

Hi Alexander,

Probably BIOS locks the access to QPI (PCI address space) in SMM mode.Afaik there is no possibility to access SMM space from within ring0, so  one of the theoritical possibilities is  reversing the BIOS and installing own SMM handler or wait for the new revision of the BIOS which enables sampling of the QPI perf counters.

439 Views

Hello

Could you ellaborate what "SMM mode", "SMM space" and "SMM handler" are?

What procedure are you sugest with "reversing the BIOS and installing own SMM handler"?

I wouldn't rely on BIOS update as I already installed latest version and I has no access to this options. The next probably will not be realeased soon.

Tnanks,

Alexander

Bernard
Black Belt
439 Views

Hi,

sorry for beign not informative:)

SMM stands for System Management Mode , which is special operation mode of CPU accessed from BIOS.

>>>What procedure are you sugest with "reversing the BIOS and installing own SMM handler"?>>>

I'm not suggesting such a procedure.It was published in few sites and it is complex task which require broad knowledge of assembly , IDA disassembler and simply locating the routine(s) which are accessing PCI address space and enabling devices 8 and 9 performance measurement.Btw Intel documentation does not specify exactly which bus and offset is used to access QPI control registers for device 8 and 9.

Bernard
Black Belt
439 Views

>>>The next probably will not be realeased soon.>>>

As I was told by one of the Intel engineers BIOS vendor is not obligated to provide such a implementation in its next revision.

439 Views

Seems I found a real cause for the error. But I can't find a reson for that behavior.

In spite of valid configuration for PCI configuration. Data cannot be properly set.

size = HalSetBusDataByOffset(PCIConfiguration, input_pcicfg_req->bus, slot.u.AsULONG,
&(input_pcicfg_req->write_value), input_pcicfg_req->reg, input_pcicfg_req->bytes);

This call returns zero. And size==0 should be considered as a error.

Input data for a call:

msr.sys: input_pcicfg_req->bus = 63
msr.sys: slot.u.AsULONG = 72
msr.sys: slot.u.bits.DeviceNumber = 8
msr.sys: slot.u.bits.FunctionNumber = 2
msr.sys: &(input_pcicfg_req->write_value) = FFFFFA8037C5D898
msr.sys: input_pcicfg_req->reg = 244
msr.sys: input_pcicfg_req->bytes = 4

Bernard
Black Belt
439 Views

I see that I was wrong kernel mode driver can access pci configuration space and write do device 8 and 9.Does the code snippet in your post belong to msr.sys driver?

439 Views

iliyapolak wrote:

I see that I was wrong kernel mode driver can access pci configuration space and write do device 8 and 9.Does the code snippet in your post belong to msr.sys driver?

Yes. msrmain.c around line 200

Bernard
Black Belt
439 Views

Thanks Alexander!

439 Views

I have some doubts about bus value used to read and write QPI LL PMU registers. xeon-e5-2600-uncore-guide.pdf  says nothing about a bus, only device and funcion. It even doesn't mention a prcedure to find out right bus.

So in the current execution the following value is used:

msr.sys: input_pcicfg_req->bus = 63

As far as I can see this value is taken from the procedure below. Procedure is simple but I couldn't find any documentation about CPU bus location.

Why procedure starts from bus zeor, why device 5 funcion 0 are used? Which spec define this.

int getBusFromSocket(const uint32 socket)
{
    int cur_bus = 0;
    uint32 cur_socket = 0;
    // std::cout << "socket: "<< socket << std::endl;
    while(cur_socket <= socket)
    {
        // std::cout << "reading from bus 0x"<< std::hex << cur_bus << std::dec << " ";
        PciHandleM h(0, cur_bus, 5, 0);
        uint32 cpubusno = 0;
        h.read32(0x108, &cpubusno); // CPUBUSNO register
        cur_bus = (cpubusno >> 8)& 0x0ff;
        // std::cout << "socket: "<< cur_socket<< std::hex << " cpubusno: 0x"<< std::hex << cpubusno << " "<<cur_bus<< std::dec << std::endl;
        if(socket == cur_socket)
            return cur_bus;
        ++cur_socket;
        ++cur_bus;
        if(cur_bus > 0x0ff)
           return -1;
    }
    return -1;
}

Thanks,

Alexander

Bernard
Black Belt
439 Views

Regarding the bus number and offset I mentioned in one of my posts that those values were not available in Uncore Guide.I suppose that developers of msr.sys probably had access to this information.

Windbg running in kernel mode can be used to scan pci buses and address space.Command !pci should provide an info about pci configuration space,next commands like eb,ed can write directly to pci registers.

Roman_D_Intel
Employee
439 Views

Alexander,

the CPUBUSNO register location (device 5, function 0) and format are documented in https://www-ssl.intel.com/content/www/us/en/processors/xeon/xeon-e5-1600-2600-vol-2-datasheet.html

best regards,

Roman

Bernard
Black Belt
439 Views

Thanks Roman for the information.

439 Views

Problem was related to BIOS settings.

Now it is solved.

Bernard
Black Belt
439 Views

Hi Alexander,

sorry for off topic question which is related to your other post,but have you checked with process explorer all your threads ID?

Tim_Day
Beginner
439 Views

Alexander Alexeev wrote:
> Problem was related to BIOS settings.
> Now it is solved.

I'm experiencing the same problem with a Dell machine; would you be kind enough to reveal what setting should be modified to expose the QPI counters ?

Thanks
Tim

Patrick_F_Intel1
Employee
439 Views

Have you looked at http://software.intel.com/en-us/articles/bios-preventing-access-to-qpi-performance-counters ?

Or go to this forum and search for 'pcm qpi bios'.

Pat

Bernard
Black Belt
439 Views

Thanks for link  a lot of valuable information can be found in those cpu datasheets.

439 Views

Tim Day wrote:

I'm experiencing the same problem with a Dell machine; would you be kind enough to reveal what setting should be modified to expose the QPI counters ?

I didn't find a way to enable counters. Dell support confirmed that PCI config space cannot be made accesable with current version of BIOS. They simple recomended to wait for update. 

I switched to another HW to continue development.

Tim_Day
Beginner
439 Views

Thanks for the response.  I do actually have a support request in with Dell on this now; their latest report was

“We made some experimental BIOS changes that allows us to see the hidden devices in the OS. As Intel does not release drivers for these devices there would be yellow bang in the device manager. Unfortunately even after that Intel’s tool complains about unrecognized CPUs. We are asking for Intel’s help to figure out what might be wrong. Appreciate your patience on this."

(yes I have pointed out the unknown device is an expected result) so I am hopeful the necessary BIOS fixes might appear at some point.  Meanwhile we have found PCM's QPI counters seem to work as expected on an older T7500 system, but of course I'd rather be getting some numbers on more current HW.

Tim_Day
Beginner
147 Views

Just tried out an experimental BIOS supplied by Dell for my T7600 which allows these devices to be unhidden, and now I'm seeing QPI related info from the PCM lib.  Fantastic!  Kudos to Dell's support for taking the trouble to develop this... not sure whether the option will be released generally in a future BIOS update?

Reply