I'm reading xeon-e5-v3-uncore-performance-monitoring.pdf. If I want to monitor some events, I need to set some *CTL, and then read it from *CTR. There are "MSR Address" and "PCICFG Address".
If I monitor PCI events, should I need to set MSR *CTL?
I'm confuse about UBox, Cbo, sbo, HA, iMC, IRP, PCU, QPI, R2PCIe and R3QPI relationship.
You are pretty much on your own here. The documentation available is sparse, and mostly limited to Volume 2 of the processor datasheet (document 330784) and the Uncore Performance Monitoring Guide that you are currently reading. Figures 1-1, 1-2, and 1-3 of the Uncore Performance Monitoring Guide show some of the possible product configurations, and these are essential when trying to figure out how the various pieces of the chip interact.
If you want to monitor PCIe transactions, the R2PCIe unit is the primary resource, but there are also events in the CBo and IRP units that can match on PCIe transactions.
There is not really an "overview" -- understanding this material involves continually cross-referencing between the sections of the uncore performance monitoring guide and creating carefully controlled microbenchmarks to test various hypotheses about what it means. This has to be combined with significant experience working with the ugly details of real cache coherence protocols -- well beyond what any textbook covers (though "A Primer on Memory Consistency and Cache Coherence" by Sorin, Hill, and Wood does start to get into a useful level of detail -- especially when discussing the "transient states" in chapters 6, 7, and 8).
Intel's decisions about what to document and what not to document are the result of a variety of conflicting interests. Some of these are easy enough to understand, while others are dependent on legal or contractual issues with confidentiality restrictions.
The good news is that some of this stuff not only makes sense, but is easily verified as being accurate enough to use for detailed performance characterization and tuning. I have had good luck with the reported data transfers from the IMC and QPI performance counters, for example. But there is certainly plenty of this material that is not in the "easy to understand" or "easy to use" categories.