I want to measure address bus utilization in BSB (Back Side Bus), NOT FSB.
This paper "Performance Scalability of a Multi-Core Web Server" refered that
Web server dosen't scale because address bus is saturated (about 75% usage).
So, Itry tocheck ourserver's utilization.
Model: Xeon 2.5Hz (E5420) - quard cores
Exported performance couter:
"VTune_DirHelp" - It's looks like Windows directory, but my system is Linux based.
So I can't read that doccument.
Can you tell me, what type of events relevant to BSB ??
Did you read the documents which come up when you search "memory bandwidth utilization" on this forum?
Why did you mention "memory bandwidth utilization"?
Actually, I wnat to know is Cache to CPU bus utilization. (Back Side Bus)
Can "memory bandwidth utilization" indirectly measure BSB utilization? and How to?
On-chip buss performance isn't dealt with by VTune, at least not for the architectures prior to Core i7, where there are some uncore events which might resemble what you seem to be talking about. Still, there are issues which can't be observed, as far as I know, with any practical developer tools. Anyway, for large practical applications, the memory bandwidth question does assume more importance.
I clearly said BSB utilization. http://en.wikipedia.org/wiki/Back_side_bus
Anyway, currently that is not important to me.
According to your reply, "Vtune can't measure BSB utilization" right?
so can you tell be little bit about "uncore events" or another tools that can measure BSB utilization
About Offcore Performance Tuning Events
These events are devoted to offcore cacheline access activity. Of particular importance is the offcore_response_0 event which is a matrix decomposition of request type by response source. It has the potential of ~65,000 non trivial programmings. There are approximately 275 predefined programmings in the Intel Performance Tuning Tools. The events monitoring the super queue activity are also listed here.
Architectural Event counting Last evel cache Activity
Offcore L1 data cache writebacks
Offcore requests blocked due to Super Queue full
Offcore response matrix event. see extended documentation
Super Queue full stall cycles
Super Queue lock splits across a cache line
I haven't seen any presentations about effective use of these events for software performance tuning. More popular, under precise events, among others, there are
so again you get quickly into memory access issues.
Questions like why dirty fill buffers get backed up by limited port access to L1 still don't have VTune or PTU events associated with them.
If you choose to apply the term back side buss, I guess there are several potential levels for it.
I looked up the wikipedia article realbright cited and several of the references it draws from and the curious thing I noticed is that the newest reference I found was from 2001, eight years ago. Eight years ago there was such a thing as the Backside Bus. It was an off-chip connection to the last level cache (generally an L2 cache) either static RAM stuck on the baseboard or a daughter card attached to the CPU or a second chip sharing the same die. The back side bus as a distinct entity went away as multiple cache levels came on chip (part of what's called the uncore on Nehalem).
I also found several web references with a totally confused concept of the Dual Independent Bus (DIB), describing it as a structure representing the distinction between front side and back side bus. These explanations are wrong. The DIB is all "front side bus" the split occuring at a chip set that divides the bus in half and plays traffic cop between the two halves with a "snoop filter" to try to limit bus activity by filtering out traffic local to one side or the other.
There are others on this forum with greater expertise on performance events than I. There may be a way to measure bus saturation. I presume your interest in this is because you have an application that isn't scaling as you think it should, and you're trying to verify whether it might be a bus bandwidth issue?