Measuring the UOPS dispatched down LSD (Loop Stream Detector)
I am trying to measure the UOPs being delivered from the Loop Stream Dectector (LSD) in my Sandy Bridge processor. I don't see any documentation in the PMCs as to doing this. Is there a method I can use to determine the # of uops delivered to the UopQ from the LSD? Is the LSD in the UopQ, if so then it's not really delivering uops to the UopQ, right?
PMC 79 allows me to measure the uops dispatched from the uop cache with umask=0x08, from the legacy decode unit (ILD) with umask=0x04 and from micro-code (MS) with umask=0x30, but if you can't determine those coming from the LSD, you can not account for all uops delivered to the UopQ.
I ask this because I'm observing the large number of uops missing which are retired in simple copy/read/write tests and want to account for the sources and identify the %'s of uops delivered to the UopQ from the various sources.
You may be able to get the number indirectly from taking those issued to the scheduler, PMC 0x0E, and subtracting the number provided to the UopQ from IDQ, PMC 0x9C. This make sense?
It tells you nothing about the distribution of uops provided per cycle by LSD. I am also measuring the LSD uops but those do not make a great deal of sense sometimes, and there's definitely overcounting taking place.
In a separate thread I also find, as does Pat @ Intel, that the number of uops provided from the UopQ to RAT is greatly overstated, esp in cases where the miss rate to the L1D is high. Any ideas why?