Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Why I can not get the Cbox0 PMU in this way?(with code in it)

Duan_Z_
Beginner
1,408 Views

 

At first I defined HSWEP Uncore C-box:

/* HSWEP Uncore C-box */
#define HSWEP_MSR_C_PMON_BOX_CTL 0xE00
#define HSWEP_MSR_C_PMON_BOX_FILTER0 0xE05
#define HSWEP_MSR_C_PMON_BOX_FILTER1 0xE06
#define HSWEP_MSR_C_PMON_BOX_STATUS 0xE07
#define HSWEP_MSR_C_PMON_EVNTSEL0 0xE01
#define HSWEP_MSR_C_PMON_CTR0  0xE08
#define HSWEP_MSR_C_PMON_CTR1  0xE09
#define HSWEP_MSR_C_PMON_EVNTSEL1 0xE02
#define HSWEP_MSR_C_MSR_OFFSET  0x10
#define HSWEP_MSR_C_EVENTSEL_MASK (HSWEP_MSR_RAW_EVNTSEL_MASK | \
      HSWEP_MSR_EVNTSEL_TID_EN)
struct uncore_box_type HSWEP_UNCORE_CBOX = {
 .name  = "C-BOX",
 .num_counters = 4,
 .num_boxes = 18,
 .perf_ctr_bits = 48,
 .perf_ctr = HSWEP_MSR_C_PMON_CTR0,
 .perf_ctl = HSWEP_MSR_C_PMON_EVNTSEL0,
 .perf_ctr_1 = HSWEP_MSR_C_PMON_CTR1,
 .perf_ctl_1 = HSWEP_MSR_C_PMON_EVNTSEL1,
 .event_mask = HSWEP_MSR_C_EVENTSEL_MASK,
 .box_ctl = HSWEP_MSR_C_PMON_BOX_CTL,
 .box_status = HSWEP_MSR_C_PMON_BOX_STATUS,
 .box_filter0 = HSWEP_MSR_C_PMON_BOX_FILTER0,
 .box_filter1 = HSWEP_MSR_C_PMON_BOX_FILTER1,
 .msr_offset = HSWEP_MSR_C_MSR_OFFSET,
 .ops  = &HSWEP_UNCORE_CBOX_OPS
};
enum {
 HSWEP_UNCORE_UBOX_ID = UNCORE_MSR_UBOX_ID,
 HSWEP_UNCORE_PCUBOX_ID = UNCORE_MSR_PCUBOX_ID,
 HSWEP_UNCORE_SBOX_ID = UNCORE_MSR_SBOX_ID,
 HSWEP_UNCORE_CBOX_ID = UNCORE_MSR_CBOX_ID
};

struct uncore_box_type *HSWEP_UNCORE_MSR_TYPE[] = {
 [HSWEP_UNCORE_UBOX_ID]   = &HSWEP_UNCORE_UBOX,
 [HSWEP_UNCORE_PCUBOX_ID] = &HSWEP_UNCORE_PCUBOX,
 [HSWEP_UNCORE_SBOX_ID]   = &HSWEP_UNCORE_SBOX,
 [HSWEP_UNCORE_CBOX_ID]   = &HSWEP_UNCORE_CBOX,
 NULL
};

My project is based on the NUMA framework, and then I call the get_box function as follows:

/*
 * Test to set cache box: (Box0, Node0), (Box0, Node1)
*/ 
CBox_0 = uncore_get_first_box(uncore_msr_type[UNCORE_MSR_CBOX_ID], 0); 



/**
 * uncore_get_first_box
 * @type: pointer to box_type
 * @nodeid: which NUMA node to get this box
 * Return: %NULL on failure
 *
 * Get the first box in the box_type list of @nodeid node. We have this
 * function because some box types only have one avaliable box within a node.
 * It is more convenient to get box without an idx. (I know...)
 */
struct uncore_box *uncore_get_first_box(struct uncore_box_type *type,
     unsigned int nodeid)
{
 struct uncore_box *box;

 if (!type || nodeid > UNCORE_MAX_SOCKET)
  return NULL;

 list_for_each_entry(box, &type->box_list, next) {
  if (box->nodeid == nodeid)
   return box;
 }

 return NULL;
}

Then I closed the other cores​ on local socket and leaving only core0 as the local node0.Then I call this function.

But when I check my dmesg,I found that the Cbox can not get the right thing.

So what should I do?

0 Kudos
1 Solution
Thomas_G_4
New Contributor II
1,408 Views

1.) If you have used the v3 Uncore Guide, it's fine. It was just confusing with you mentioning the v2 Uncore Guide.

2.) There are two different types of Uncore configurations. The CBoxes for Desktop chips are described in Intel SDM (that's also where I have the MSR_UNC_CBO_CONFIG from). The CBoxes for Server chips are described in the corresponding Uncore Guide.

3.) Of course you should use the CBox config registers. The hint with U_MSR_PMON_GLOBAL_CONFIG​ and false MSR_UNC_CBO_CONFIG was just a guess why you cannot find any CBoxes. If the appropriate register returns 0 CBoxes for your system, it is clear why they weren't in the list. So, configure your events in the Cx_MSR_PMON_BOX_CT​L. Don't forget to set bit 22 (enable). If you are sure you programmed them properly but they don't start counting, you should set the unfreeze bit (29) in U_MSR_PMON_GLOBAL_CTL. This register controls all uncore PMUs.

4.) This question can only be answered by somebody worked with your code or the kernel functions you use. I would try good old printk debugging to see which boxes are in the list. If the CBoxes are not inside the list, I would check the list creation code why it excluded the CBoxes.

View solution in original post

0 Kudos
7 Replies
McCalpinJohn
Honored Contributor III
1,408 Views

I can't tell if you are having trouble getting the hardware to do what you want or you are having trouble getting the software to do what you want....

The "rdmsr.c" program from msrtools-1.3 compiles to a very useful command-line tool that you can use to check whether your program has put the expected bits in the expected registers.

0 Kudos
Duan_Z_
Beginner
1,408 Views

Thanks for your tool,that wiil be so helpful.

I know MSRbox must to pay attention to which core you manipulate when it used.But I don't know how to get a MSRbox like Cbox.

I already use this way as I show in the code to get the HA box,but I can not get the Cbox.

static struct uncore_box *HA_Box_0, *HA_Box_1; 

HA_Box_0 = uncore_get_first_box(uncore_pci_type[UNCORE_PCI_HA_ID], 0);
HA_Box_1 = uncore_get_first_box(uncore_pci_type[UNCORE_PCI_HA_ID], 1);

But when I use

static struct uncore_box *CBox_0;

CBox_0 = uncore_get_first_box(uncore_msr_type[UNCORE_MSR_CBOX_ID], 0); 

I can't get the right Cbox I want Even if I have already set the MSR Address Like the above.

0 Kudos
Thomas_G_4
New Contributor II
1,408 Views

There is a register called MSR_UNC_CBO_CONFIG  (0x396) which tells you how many CBoxes are available.
Like John, I don't see where your problem is (hardware or software side). Your code snippets do nothing with the registers.

 

0 Kudos
Duan_Z_
Beginner
1,408 Views

I can not found the MSR_UNC_CBO_CONFIG in PMU (Intel® Xeon® Processor E5 v2 and E7 v2 Product Families Uncore Performance Monitoring Reference Manual).

Which I can find in this sheet is C0_MSR_PMON_BOX_CTL  of CBo Performance Monitors and the MSR Address is 0x0E00.

So I use list_for_each_entry​(box type,box list,node list) function in Linux kernel to get the CBox on my NUMA node.

The detailed code was listed at the beginning.

0 Kudos
McCalpinJohn
Honored Contributor III
1,408 Views

I still don't understand what you are asking....

You say that your processor is a Haswell, but then you reference the Xeon E5 v2 (Ivy Bridge) Uncore Performance Monitoring Reference Manual.  Perhaps a typo?  Perhaps the wrong document?

The MSR_UNC_CBO_CONFIG register is described in several sections of Chapters 18 and 35 of Volume 3 of the Intel Architectures Software Developer's Manual.  In every case, these sections are describing features of the "Core" or "Xeon E3" processors --- NOT the Xeon E5 and Xeon E7 processors that are the subject of the Uncore Performance Monitoring Guides.

The number of CBox's in a Xeon E5 v3 is in bits 4:0 of an MSR called U_MSR_PMON_GLOBAL_CONFIG, described in Tables 2-1 and 2-4 of the Xeon E5 v3 Uncore Performance Monitoring Guide (document 331051).   According to Table 2-1, the MSR number is 0x702 on the Xeon E5 v3.  This MSR has changed locations -- on Xeon E5 v2 the address was 0xC06.

0 Kudos
Duan_Z_
Beginner
1,408 Views

THX John​ and Thomas,

I am sorry for the fact that I have not described the problem clearly.I think the problem maybe the software side​(? I'm not sure).

My processor is Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz​ and I use Intel® Xeon® Processor E5 and E7 v3 Family Uncore Performance Monitoring Reference Manual to help me set the PMU.

So here is my question:

1.Did I use the wrong document​ to set PMU?(As matter for Haswell, that's a typo.The value it defined is in accordance with the Xeon® Processor E5 and E7 v3 ​ PMU manual)(I have already set the HA box,and it works well)

2.Did I use the wrong document to set  CBo Performance Monitors ​?(Because you say use the Intel Architectures Software Developer's Manual )

3.I did found the U_MSR_PMON_GLOBAL_CONFIG​ in Xeon E5 E7 PMU manual,and I also know it's the UBox PMON Global Configuration.

But should I not use C0_MSR_PMON_BOX_CT​L(Table 2-13. CBo Performance Monitoring Registers (MSR) )(in PMU manual) to control my CBo Performance Monitors?

IF not , what function did this CTL have?And what is the MSR_UNC_CBO_CONFIG​? Is it same as the C0_MSR_PMON_BOX_CT​L​ I mentioned?

4.IF I did nothing wrong (Please ignore my variable named because i think my value is correct),Why can't I get the right Cbox?

0 Kudos
Thomas_G_4
New Contributor II
1,409 Views

1.) If you have used the v3 Uncore Guide, it's fine. It was just confusing with you mentioning the v2 Uncore Guide.

2.) There are two different types of Uncore configurations. The CBoxes for Desktop chips are described in Intel SDM (that's also where I have the MSR_UNC_CBO_CONFIG from). The CBoxes for Server chips are described in the corresponding Uncore Guide.

3.) Of course you should use the CBox config registers. The hint with U_MSR_PMON_GLOBAL_CONFIG​ and false MSR_UNC_CBO_CONFIG was just a guess why you cannot find any CBoxes. If the appropriate register returns 0 CBoxes for your system, it is clear why they weren't in the list. So, configure your events in the Cx_MSR_PMON_BOX_CT​L. Don't forget to set bit 22 (enable). If you are sure you programmed them properly but they don't start counting, you should set the unfreeze bit (29) in U_MSR_PMON_GLOBAL_CTL. This register controls all uncore PMUs.

4.) This question can only be answered by somebody worked with your code or the kernel functions you use. I would try good old printk debugging to see which boxes are in the list. If the CBoxes are not inside the list, I would check the list creation code why it excluded the CBoxes.

0 Kudos
Reply