- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to measure address bus utilization in BSB (Back Side Bus), NOT FSB.
This paper "Performance Scalability of a Multi-Core Web Server" refered that
Web server dosen't scale because address bus is saturated (about 75% usage).
So, Itry tocheck ourserver's utilization.
some informations
Model: Xeon 2.5Hz (E5420) - quard cores
Exported performance couter:
BUS_BNR_DRV.ALL_AGENTS
BUS_BNR_DRV.THIS_AGENT
BUS_DATA_RCV.BOTH_CORES
BUS_DATA_RCV.SELF
BUS_DRDY_CLOCKS.ALL_AGENTS
BUS_DRDY_CLOCKS.THIS_AGENT
BUS_HITM_DRV.ALL_AGENTS
BUS_HITM_DRV.THIS_AGENT
BUS_HIT_DRV.ALL_AGENTS
BUS_HIT_DRV.THIS_AGENT
BUS_IO_WAIT.BOTH_CORES
BUS_IO_WAIT.SELF
BUS_LOCK_CLOCKS.ALL_AGENTS
BUS_LOCK_CLOCKS.BOTH_CORES.THIS_AGENT
BUS_LOCK_CLOCKS.SELF
BUS_REQUEST_OUTSTANDING.ALL_AGENTS
BUS_REQUEST_OUTSTANDING.BOTH_CORES.THIS_AGENT
BUS_REQUEST_OUTSTANDING.SELF
BUS_TRANS_ANY.ALL_AGENTS
BUS_TRANS_ANY.BOTH_CORES.THIS_AGENT
BUS_TRANS_ANY.SELF
BUS_TRANS_BRD.ALL_AGENTS
BUS_TRANS_BRD.BOTH_CORES.THIS_AGENT
BUS_TRANS_BRD.SELF
BUS_TRANS_BURST.ALL_AGENTS
BUS_TRANS_BURST.BOTH_CORES.THIS_AGENT
BUS_TRANS_BURST.SELF
BUS_TRANS_DEF.ALL_AGENTS
BUS_TRANS_DEF.BOTH_CORES.THIS_AGENT
BUS_TRANS_DEF.SELF
BUS_TRANS_IFETCH.ALL_AGENTS
BUS_TRANS_IFETCH.BOTH_CORES.THIS_AGENT
BUS_TRANS_IFETCH.SELF
BUS_TRANS_INVAL.ALL_AGENTS
BUS_TRANS_INVAL.BOTH_CORES.THIS_AGENT
BUS_TRANS_INVAL.SELF
BUS_TRANS_IO.ALL_AGENTS
BUS_TRANS_IO.BOTH_CORES.THIS_AGENT
BUS_TRANS_IO.SELF
BUS_TRANS_MEM.ALL_AGENTS
BUS_TRANS_MEM.BOTH_CORES.THIS_AGENT
BUS_TRANS_MEM.SELF
BUS_TRANS_P.ALL_AGENTS
BUS_TRANS_P.BOTH_CORES.THIS_AGENT
BUS_TRANS_P.SELF
BUS_TRANS_PWR.ALL_AGENTS
BUS_TRANS_PWR.BOTH_CORES.THIS_AGENT
BUS_TRANS_PWR.SELF
BUS_TRANS_RFO.ALL_AGENTS
BUS_TRANS_RFO.BOTH_CORES.THIS_AGENT
BUS_TRANS_RFO.SELF
BUS_TRANS_WB.ALL_AGENTS
BUS_TRANS_WB.BOTH_CORES.THIS_AGENT
BUS_TRANS_WB.SELF
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"VTune_DirHelp" - It's looks like Windows directory, but my system is Linux based.
So I can't read that doccument.
Can you tell me, what type of events relevant to BSB ??
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you read the documents which come up when you search "memory bandwidth utilization" on this forum?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why did you mention "memory bandwidth utilization"?
Actually, I wnat to know is Cache to CPU bus utilization. (Back Side Bus)
Can "memory bandwidth utilization" indirectly measure BSB utilization? and How to?
thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On-chip buss performance isn't dealt with by VTune, at least not for the architectures prior to Core i7, where there are some uncore events which might resemble what you seem to be talking about. Still, there are issues which can't be observed, as far as I know, with any practical developer tools. Anyway, for large practical applications, the memory bandwidth question does assume more importance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On-chip buss performance isn't dealt with by VTune, at least not for the architectures prior to Core i7, where there are some uncore events which might resemble what you seem to be talking about. Still, there are issues which can't be observed, as far as I know, with any practical developer tools. Anyway, for large practical applications, the memory bandwidth question does assume more importance.
I clearly said BSB utilization. http://en.wikipedia.org/wiki/Back_side_bus
Anyway, currently that is not important to me.
According to your reply, "Vtune can't measure BSB utilization" right?
so can you tell be little bit about "uncore events" or another tools that can measure BSB utilization
thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
About Offcore Performance Tuning Events
These events are devoted to offcore cacheline access activity. Of particular importance is the offcore_response_0 event which is a matrix decomposition of request type by response source. It has the potential of ~65,000 non trivial programmings. There are approximately 275 predefined programmings in the Intel Performance Tuning Tools. The events monitoring the super queue activity are also listed here.
Symbol Name |
Event Code |
Description |
---|---|---|
0x6C |
I/O transactions |
|
0x2E |
Architectural Event counting Last evel cache Activity |
|
0xB0 |
Offcore L1 data cache writebacks |
|
0xB2 |
Offcore requests blocked due to Super Queue full |
|
0xB7 |
Offcore response matrix event. see extended documentation |
|
0xF6 |
Super Queue full stall cycles |
|
0xF4 |
Super Queue lock splits across a cache line |
I haven't seen any presentations about effective use of these events for software performance tuning. More popular, under precise events, among others, there are
OTHER_CORE_L2_HITM
REMOTE_CACHE_LOCAL_HOME_HIT
REMOTE_DRAM
LOCAL_DRAM
so again you get quickly into memory access issues.
Questions like why dirty fill buffers get backed up by limited port access to L1 still don't have VTune or PTU events associated with them.
If you choose to apply the term back side buss, I guess there are several potential levels for it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I clearly said BSB utilization. http://en.wikipedia.org/wiki/Back_side_bus
Anyway, currently that is not important to me.
According to your reply, "Vtune can't measure BSB utilization" right?
so can you tell be little bit about "uncore events" or another tools that can measure BSB utilization
thanks.
I looked up the wikipedia article realbright cited and several of the references it draws from and the curious thing I noticed is that the newest reference I found was from 2001, eight years ago. Eight years ago there was such a thing as the Backside Bus. It was an off-chip connection to the last level cache (generally an L2 cache) either static RAM stuck on the baseboard or a daughter card attached to the CPU or a second chip sharing the same die. The back side bus as a distinct entity went away as multiple cache levels came on chip (part of what's called the uncore on Nehalem).
I also found several web references with a totally confused concept of the Dual Independent Bus (DIB), describing it as a structure representing the distinction between front side and back side bus. These explanations are wrong. The DIB is all "front side bus" the split occuring at a chip set that divides the bus in half and plays traffic cop between the two halves with a "snoop filter" to try to limit bus activity by filtering out traffic local to one side or the other.
There are others on this forum with greater expertise on performance events than I. There may be a way to measure bus saturation. I presume your interest in this is because you have an application that isn't scaling as you think it should, and you're trying to verify whether it might be a bus bandwidth issue?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page