Solved: The uncore event about GQ

木子_李_ · ‎02-05-2015

Hi,all

I found that there are four main events of the GQ(uncore), they are: UNC_GQ_CYCLES_FULL.X, UNC_GQ_CYCLES_NOT_EMPTY.X, UNC_GQ_OCCUPANCY.X, and UNC_GQ_ALLOC_.X. I want to get the values of the events, but I don't know the mean of the ALLOC event and the OCCUPANAY.

So is there anyone know?

Thanks.

Muzi Li

McCalpinJohn · ‎02-06-2015

On each cycle, the "occupancy" counters are incremented by the number of entries currently in the queue. E.g. if there are 8 items in the queue, the counter will be incremented by 8. It will do this every cycle that there are 8 entries in the queue, so it is implicitly a count of cycles as well as of entries.

The "alloc" counter is incremented every time a new item is added to the queue. This is clearly not a cycle counter -- it only increments when new items are added.

So if you put one entry in the queue and it stays there for 100 cycles, the "occupancy" counter will increase by 1 in each of those 100 cycles, while the "alloc" counter will only be incremented once when the entry is first added. Dividing the "occupancy" counter by the "alloc" counter gives 100, which is the duration of queue occupancy for that one transaction.

When you mix lots of transactions and have multiple entries in the queue, then the result must be interpreted as the average duration spent in the queue (with the "average" being across all of the transactions).

I have not used these particular GQ occupancy counters, but some occupancy counters allow filtering based on a "count mask", and may allow "edge detect" and "invert" functions. This allows you to do other interesting computations, such as determining the average number of entries in the queue while the queue is actually in use. (Divide the queue "occupancy" counter by a counter that increments by one in each cycle for which there is at least one entry in the queue.) With multiple experiments you can build histograms -- counting cycles in which there are (for example) 8 or more entries in the queue, 7 or more entries in the queue, 6 or more entries in the queue, etc. Fun stuff!

View solution in original post

Patrick_F_Intel1 · ‎02-05-2015

Hello Muzi,

I haven't tested the events myself but I assume that the ALLOC event counts when a particular GQ item is added to the queue. The OCCUPANCY.X event counts, on each clocktick (probably uncore clockticks), how many items of type X are in the GQ queue.

So you can compute latencies for a given event X in the queue and/or the average number of events of type X in the GQ and other quantities like this. I wrote a pdf on the web about the things you can compute but I can't find it at the moment.

I see that there is an issue with UNC_GQ_ALLOC for some chips (when I google 'UNC_GQ_ALLOC' I see a specification update listing an issue for UNC_GQ_ALLOC.RT_LLC_MISS in chip Intel® Core™ i5-600, i3-500 Desktop Processor Series and Intel® Pentium Desktop Processor 6000 Series. I don't know if this is the chip you are using or the subevent you are interested in. See http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/core-i5-600-i3-500-pentium-6000-spec-update.pdf

Pat

木子_李_ · ‎02-05-2015

Hi Pat

Thanks for your time. But I'm still confused about how to compute latencies using these two events.

The intel developer manual 2014 says that latency can be measured by the average duration of the queue occupancy and the ratio of UNC_GQ_TRACKER_OCCUP.X/UNC_GQ_ALLOC.X measures an average duration of queue occupancy. If these two events all count number of GQ items, how come their ratio denotes the latency? I didn't find the UNC_GQ_TRACKER_OCCUP.X event in the event tables of intel manual 3b part. It only has UNC_GQ_OCCUPANCY.READ_TRACKER which is described as increments the number of queue entries (code read,data read, and RFOs) in the read tracker.

And thanks you for the information about the update specification. I am gonna use RT_LLC_MISS in future estimation.

Thanks again for your help.

Muzi

McCalpinJohn · ‎02-06-2015

On each cycle, the "occupancy" counters are incremented by the number of entries currently in the queue. E.g. if there are 8 items in the queue, the counter will be incremented by 8. It will do this every cycle that there are 8 entries in the queue, so it is implicitly a count of cycles as well as of entries.

The "alloc" counter is incremented every time a new item is added to the queue. This is clearly not a cycle counter -- it only increments when new items are added.

So if you put one entry in the queue and it stays there for 100 cycles, the "occupancy" counter will increase by 1 in each of those 100 cycles, while the "alloc" counter will only be incremented once when the entry is first added. Dividing the "occupancy" counter by the "alloc" counter gives 100, which is the duration of queue occupancy for that one transaction.

When you mix lots of transactions and have multiple entries in the queue, then the result must be interpreted as the average duration spent in the queue (with the "average" being across all of the transactions).

I have not used these particular GQ occupancy counters, but some occupancy counters allow filtering based on a "count mask", and may allow "edge detect" and "invert" functions. This allows you to do other interesting computations, such as determining the average number of entries in the queue while the queue is actually in use. (Divide the queue "occupancy" counter by a counter that increments by one in each cycle for which there is at least one entry in the queue.) With multiple experiments you can build histograms -- counting cycles in which there are (for example) 8 or more entries in the queue, 7 or more entries in the queue, 6 or more entries in the queue, etc. Fun stuff!

木子_李_ · ‎02-06-2015

Hi John

Thanks for your explanation. It's very detailed and clear. And there is two more questions: how the counters get the number of entries currently in the queue? can I get the number in experiments?

Regards,

Muzi

McCalpinJohn · ‎02-09-2015

This particular event is designed to increment the performance counter by the number of entries in the queue. The hardware queue manager clearly has to track how many entries are currently in use (otherwise it would not know when the queue was full!), so when this event is programmed the hardware queue manager sends that value to the performance counter unit every cycle. This is typically done on a private internal "performance counter bus", which may or may not be separate from other private internal bus structures (such as those used to access MSRs or PCI configuration space addresses in various units).

I don't see any way to instantaneously sample the number of queue entries, but I think that what you can get is more useful.

One way to get more detail is to read the performance counters more frequently -- this will still be an average, but it will be an average over a shorter time period. I typically set up experiments to read the counters immediately before and after a loop that I am interested in studying, so the counts can be attributed fairly directly to the code. (I say "fairly directly" because the uncore is a shared structure and you could see counts influenced by other cores or by transactions that began before the code that I am interested in, but which are still occupying queue entries.)

The event name UNC_GQ_OCCUPANCY only shows up in the documentation for Westmere 06_25h, 06_2Ch, and 06_1Fh parts (Table 19-16 of volume 3 of the Intel Architectures SW Developer's Manual), where it is listed as a core performance counter event. As a core event you should be able to use the counter mask and invert flags to count cycles in which the queue has more than a target number of entries or less than a target number of entries. This still does not give an instantaneous sample, but it can give much more useful information about the queue occupancy and its temporal variability.

木子_李_ · ‎02-12-2015

This helps a lot. Thank you, John.