Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

How can we know two PMU events are contradicting with each other or not?

Hao_Shen
Beginner
448 Views

Hi,

I read from somwhere saying that some events are contradicting with each other so that they can not be monitored at the same time(probably can be do by time-multiplexing). So my question is: how can we know that?

I am using an Ivy-bridge processor. I use Perf tool in Ubuntu. I tried some combinations of 4 events in the event table and it looks like they all can be monitored at the same time... Is there any other tool which can tell?

Thanks;>

0 Kudos
7 Replies
Patrick_F_Intel1
Employee
448 Views

Hello Hao Shen,

It is the responsibility of the tool to know the restrictions on events and program the events appropriately. I don't know if the linux perf tool looks for and applies the restrictions. The Intel VTune Amplifier tool programs the restrictions on events.

It can be quite complicated to figure out which events can be collected at the same time. You can see some of the restrictions on events by looking in the VTune bin32 dir, at the ivybridge_db.txt file (which has all the public events for ivybridge). For instance, the COUNTER column lists which general counter the event can be programmed. You'll see some can be programmed in any of the 4 general counters, some in only 1 of the general counters. Sometimes this restriction is truely due to which counter the event can be programmed into, sometimes the restriction is just to show a restriction for a set of mutually exclusive events (events which can only be collected 1 at a time). The SDM should have the details of the restrictions for each event.

Pat

0 Kudos
Hao_Shen
Beginner
448 Views

Patrick Fay (Intel) wrote:

Hello Hao Shen,

It is the responsibility of the tool to know the restrictions on events and program the events appropriately. I don't know if the linux perf tool looks for and applies the restrictions. The Intel VTune Amplifier tool programs the restrictions on events.

It can be quite complicated to figure out which events can be collected at the same time. You can see some of the restrictions on events by looking in the VTune bin32 dir, at the ivybridge_db.txt file (which has all the public events for ivybridge). For instance, the COUNTER column lists which general counter the event can be programmed. You'll see some can be programmed in any of the 4 general counters, some in only 1 of the general counters. Sometimes this restriction is truely due to which counter the event can be programmed into, sometimes the restriction is just to show a restriction for a set of mutually exclusive events (events which can only be collected 1 at a time). The SDM should have the details of the restrictions for each event.

Pat

Hi Pat,

Thanks for your info. I know on my processor, there are 4 general purpose performance counters and 3 fixed performance counters. For the fixed counters, the events it can collect is fixed. So you mean even for the general purpose counters, not every counter can be programmed to count every event? Do you know why?

0 Kudos
Hao_Shen
Beginner
448 Views

Patrick Fay (Intel) wrote:

Hello Hao Shen,

It is the responsibility of the tool to know the restrictions on events and program the events appropriately. I don't know if the linux perf tool looks for and applies the restrictions. The Intel VTune Amplifier tool programs the restrictions on events.

It can be quite complicated to figure out which events can be collected at the same time. You can see some of the restrictions on events by looking in the VTune bin32 dir, at the ivybridge_db.txt file (which has all the public events for ivybridge). For instance, the COUNTER column lists which general counter the event can be programmed. You'll see some can be programmed in any of the 4 general counters, some in only 1 of the general counters. Sometimes this restriction is truely due to which counter the event can be programmed into, sometimes the restriction is just to show a restriction for a set of mutually exclusive events (events which can only be collected 1 at a time). The SDM should have the details of the restrictions for each event.

Pat

Hi Pat,

Thanks for your info. I know on my processor, there are 4 general purpose performance counters and 3 fixed performance counters. For the fixed counters, the events it can collect is fixed. So you mean even for the general purpose counters, not every counter can be programmed to count every event? Do you know why?

0 Kudos
Patrick_F_Intel1
Employee
448 Views

There are probably a variety of reasons why some events can't be programmed into any counter. It would take a good bit of research to find the list of events with restrictions, find the reason for the restrictions, see if the reason is already public or can be made public... and at the end of the day I'm not sure it would change anything. Do you have a particular issue you are trying to resolve or are you just curious?

Pat

0 Kudos
Hao_Shen
Beginner
448 Views

Patrick Fay (Intel) wrote:

There are probably a variety of reasons why some events can't be programmed into any counter. It would take a good bit of research to find the list of events with restrictions, find the reason for the restrictions, see if the reason is already public or can be made public... and at the end of the day I'm not sure it would change anything. Do you have a particular issue you are trying to resolve or are you just curious?

Pat

I see. Yes, I have a particular issue.

 want to run some workloads and monitor as many events as possible to figure out the relation between different events and the performance of the workloads. I need to find out which events affect  the workload's performance most. So an idea way is to run the workload , just monitor one event and record it. Then run the workload again to minotor another event. By doing this, it can be ensured that every event monitored is precise.However,obviously there are so many events. So it's better to monitor multiple events in a single run. To do so, it's necessary for me to figure out which events can be monitored in the same single run. If the events are contradicting, then I am afaid the Perf tool will do some time-multiplexing and scale the results in the end. Maybe the results are still reasonable enough but obviously it's not optimal. So at least I need some clue what kind of events will contradict with each other and what will not:)

0 Kudos
Patrick_F_Intel1
Employee
448 Views

Note that, if you just want "per core" performance and you disable HT, then you can get 8 counters per core (if HT is disabled).

Not all events can be collected on all 8 counters however. You can see the VTune\bin32\ivybridge_db.txt file to see which events can be collected on which counters. See the COUNTER and the COUNTER_HT_OFF columns.

Some of the restrictions on which events can be in which counters in the file are artificial and are really a way to just make sure that you don't try to code 2 mutually exclusive events at the same time.

For instance, the OFFCORE_RESPONSE events are shown in the file as only being allowed on counter 0. I think that you really can put the offcore response event in any general counter but you can only collect 1 offcore_response event at a time since you have to program the MSR 0x1a6 with a value to select which type of offcore response you want. So saying "only use gen counter 0" is a way to tell your scheduling algorithm that, if there is already an offcore_response event in counter 0 and the user wants another offcore_response event, then vtune has to start a new multi-plex group. You'd have to check the SDM for each event to see what the restrictions are on each event.

There is also a TAKEN_ALONE column in the file which I believe means that the event must be collected by itself (no other events at the same time).

I would try to model my data collection after the rules in the event file (if perf doesn't automatically take all this stuff into account alrready). Or you can just use VTune and it will do all this figuring out for you.

Pat

0 Kudos
Hao_Shen
Beginner
448 Views

Patrick Fay (Intel) wrote:

Note that, if you just want "per core" performance and you disable HT, then you can get 8 counters per core (if HT is disabled).

Not all events can be collected on all 8 counters however. You can see the VTune\bin32\ivybridge_db.txt file to see which events can be collected on which counters. See the COUNTER and the COUNTER_HT_OFF columns.

Some of the restrictions on which events can be in which counters in the file are artificial and are really a way to just make sure that you don't try to code 2 mutually exclusive events at the same time.

For instance, the OFFCORE_RESPONSE events are shown in the file as only being allowed on counter 0. I think that you really can put the offcore response event in any general counter but you can only collect 1 offcore_response event at a time since you have to program the MSR 0x1a6 with a value to select which type of offcore response you want. So saying "only use gen counter 0" is a way to tell your scheduling algorithm that, if there is already an offcore_response event in counter 0 and the user wants another offcore_response event, then vtune has to start a new multi-plex group. You'd have to check the SDM for each event to see what the restrictions are on each event.

There is also a TAKEN_ALONE column in the file which I believe means that the event must be collected by itself (no other events at the same time).

I would try to model my data collection after the rules in the event file (if perf doesn't automatically take all this stuff into account alrready). Or you can just use VTune and it will do all this figuring out for you.

Pat

Thanks. I have already downloaded the Vtune and I will do it according to the rules :>

0 Kudos
Reply