09-14-2011 02:55 PM
I am trying to measure the UOPs being delivered from the Loop Stream Dectector (LSD) in my Sandy Bridge processor. I don't see any documentation in the PMCs as to doing this. Is there a method I can use to determine the # of uops delivered to the UopQ from the LSD? Is the LSD in the UopQ, if so then it's not really delivering uops to the UopQ, right?
PMC 79 allows me to measure the uops dispatched from the uop cache with umask=0x08, from the legacy decode unit (ILD) with umask=0x04 and from micro-code (MS) with umask=0x30, but if you can't determine those coming from the LSD, you can not account for all uops delivered to the UopQ.
I ask this because I'm observing the large number of uops missing which are retired in simple copy/read/write tests and want to account for the sources and identify the %'s of uops delivered to the UopQ from the various sources.
09-14-2011 03:16 PM
Further, if I measure the UOPs delivered to the from the IDQ using B.3.7.2 here:
You will get a different number from that measured from PMC 9C or from PMC 0E.
PMC 9C measures the # of uops delivered from the UopQ to the Renamer/Resource allocation table (RAT).
PMC 0E measures the # of uops issues from RAT to the scheduler, correct?
I'm trying to determine the # of uops provided by LSD because in some cases the # issues/retired varies significantly from that provided by IDQ.
09-14-2011 03:34 PM
You may be able to get the number indirectly from taking those issued to the scheduler, PMC 0x0E, and subtracting the number provided to the UopQ from IDQ, PMC 0x9C. This make sense? It tells you nothing about the distribution of uops provided per cycle by LSD. I am also measuring the LSD uops but those do not make a great deal of sense sometimes, and there's definitely overcounting taking place. In a separate thread I also find, as does Pat @ Intel, that the number of uops provided from the UopQ to RAT is greatly overstated, esp in cases where the miss rate to the L1D is high. Any ideas why? Perfwise