Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1708 Discussions

Question about Core Specificity Encoding option for reading cpu performace counters

Hamid_Reza_K_
Beginner
239 Views

Hi list,

I am going to obtain core cycle during which data bus is busy for a multi-threaded application executed on Core 2 Duo. I found that performace event "Dbus_Busy" meets my purpose. But, as you know, to use the event, you are supposed to sepecify core-specificity encoding. There are two options for Core Specificity Encoding: All cores and This core. I wonder if you could tell me what the meaning of this core option is for a multi-threaded application?

Best regards,

H. R. Khaleghzadeh

0 Kudos
1 Reply
McCalpinJohn
Honored Contributor III
239 Views

The discussion in section 18.3 of Volume 3 of the Intel Software Developer's Manual (document 325384, revision 053, January 2015) seems pretty clear.   For the Dbus_Busy event (Event 22H in Table 19-22):

  • If the high-order bits of the Umask are set to 11b, then the counter will increment when the bus is busy no matter which core initiated the bus transaction.
  • If the high-order bits of the Umask are set to 01b, then the counter will increment when the bus is busy only if the transaction on the bus was initiated by the core doing the measurement.

I suspect that the words "Requires core-specificity" in Table 19-22 of Volume 3 of the SWDM are to remind the user that the you must set the high-order bits of the Umask to one of the two allowed patterns -- forgetting to set them is equivalent to selecting 00b, which is a reserved encoding and would result in either no counting or in incorrect counting.

If probably would have been easier to understand if the description of the Umask field in Table 19-22 said to use a value of C0H to count for transactions initiated by all cores or 80H to count transactions initiated only by the current core.

For multi-threaded applications, you need to be sure to pin each thread to a single core if you want to try to map the results with the "this core only" mask to per-thread values.

0 Kudos
Reply