I am using publicly availalble software as well as my personal software to measure the PMC events on various architectures. I've found it very useful getting a perspective of how my code, dense linear algebra and other high performance codes, runs upon Intel platforms.
I am also using SDE, which I've found to be very valuable, in understanding the "workload" signature, outside of architecture, that my codes have. It was from this that I observed the large number of Store To Load Forwarding instances in my code and other codes.
I am currently on both SB and IB.
It's for this reason I want to undersand what the MOB and it's behavior ellucidated by the PMC events at the link below more clearly:
0) Looking at the micro-arch diagrams in the Opt Guide, where is the "reservation station" on SB/IB? By dispatch, is it implying from the micro-op Q?
0a) when a LD op is dispatched from the micro-op Q, does it have to look for tokens in the LdQ?
1) Does the MOB keep track of store data, how many stores can it track (equal to or less than the ST Q?).
2) Does it do this so as to forward ST data to loads which hit upon previous store data? If so, how much latency is saved by doing so?
3) IV doesn't have a reservation station, I believe. Right? It's a scheduler. So:
3a) LOAD DISPATCH:RS --- PMC 0x13: unit mask 0x01: does this count the number of LDs executed which bypassed the MOB and loaded data not tracked in the MOB?
3b) LOAD DISPATCH:RS_DELAY --- PMC 0x13: unit mask 0x02: what is stage 305, and what does this measure? False STLF which is made by a partial match to a previous store (don't think so). What is this measuring, very confusing.
3c) LOAD DISPATCH: MOB --- presume this is counting the # of LDs which obtained their data from the MOB because of a previous store and it meeting conditions outlined in the opt guide and it's valid to be forwarded, right? This load still takes a LD buffer token, right?
3d) this must count all the type of load scenarios above.
Anybody going to respond to the questions above or is it not known as to what the PMCs are measuring? Any help is appreciated.
I read more in the Opt guide.. seems at rename you make sure you have enough tokens for LDq, STq, ROB, etc. I've implemented those PMCs and they appear to work. I also implemented PMC 0x10.. but that doesn't appear to work on my DGEMM, I get rubbish in the results on IV. Likewise for PMC 0x12 (though which looks quite useful).
I tried to measure some of the MOB stats, but none apparently work that you document. Any help in understanding the behavior or action a load goes through via interactions within the MOB.. would be enlightening. It appears to me from the opt guide that all loads go through the MOB (it contains the LDq, STq, etc.. ) so how does it bypass the MOB? The declaration of the PMC seems to need some clarification for true usefulness.