Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4992 Discussions

How to measure utilization of BSB?

realbright
Beginner
594 Views
Hi, all

I want to measure address bus utilization in BSB (Back Side Bus), NOT FSB.

This paper "Performance Scalability of a Multi-Core Web Server" refered that
Web server dosen't scale because address bus is saturated (about 75% usage).
So, Itry tocheck ourserver's utilization.

some informations

Model: Xeon 2.5Hz (E5420) - quard cores

Exported performance couter:

BUS_BNR_DRV.ALL_AGENTS
BUS_BNR_DRV.THIS_AGENT
BUS_DATA_RCV.BOTH_CORES
BUS_DATA_RCV.SELF
BUS_DRDY_CLOCKS.ALL_AGENTS
BUS_DRDY_CLOCKS.THIS_AGENT
BUS_HITM_DRV.ALL_AGENTS
BUS_HITM_DRV.THIS_AGENT
BUS_HIT_DRV.ALL_AGENTS
BUS_HIT_DRV.THIS_AGENT
BUS_IO_WAIT.BOTH_CORES
BUS_IO_WAIT.SELF
BUS_LOCK_CLOCKS.ALL_AGENTS
BUS_LOCK_CLOCKS.BOTH_CORES.THIS_AGENT
BUS_LOCK_CLOCKS.SELF
BUS_REQUEST_OUTSTANDING.ALL_AGENTS
BUS_REQUEST_OUTSTANDING.BOTH_CORES.THIS_AGENT
BUS_REQUEST_OUTSTANDING.SELF
BUS_TRANS_ANY.ALL_AGENTS
BUS_TRANS_ANY.BOTH_CORES.THIS_AGENT
BUS_TRANS_ANY.SELF
BUS_TRANS_BRD.ALL_AGENTS
BUS_TRANS_BRD.BOTH_CORES.THIS_AGENT
BUS_TRANS_BRD.SELF
BUS_TRANS_BURST.ALL_AGENTS
BUS_TRANS_BURST.BOTH_CORES.THIS_AGENT
BUS_TRANS_BURST.SELF
BUS_TRANS_DEF.ALL_AGENTS
BUS_TRANS_DEF.BOTH_CORES.THIS_AGENT
BUS_TRANS_DEF.SELF
BUS_TRANS_IFETCH.ALL_AGENTS
BUS_TRANS_IFETCH.BOTH_CORES.THIS_AGENT
BUS_TRANS_IFETCH.SELF
BUS_TRANS_INVAL.ALL_AGENTS
BUS_TRANS_INVAL.BOTH_CORES.THIS_AGENT
BUS_TRANS_INVAL.SELF
BUS_TRANS_IO.ALL_AGENTS
BUS_TRANS_IO.BOTH_CORES.THIS_AGENT
BUS_TRANS_IO.SELF
BUS_TRANS_MEM.ALL_AGENTS
BUS_TRANS_MEM.BOTH_CORES.THIS_AGENT
BUS_TRANS_MEM.SELF
BUS_TRANS_P.ALL_AGENTS
BUS_TRANS_P.BOTH_CORES.THIS_AGENT
BUS_TRANS_P.SELF
BUS_TRANS_PWR.ALL_AGENTS
BUS_TRANS_PWR.BOTH_CORES.THIS_AGENT
BUS_TRANS_PWR.SELF
BUS_TRANS_RFO.ALL_AGENTS
BUS_TRANS_RFO.BOTH_CORES.THIS_AGENT
BUS_TRANS_RFO.SELF
BUS_TRANS_WB.ALL_AGENTS
BUS_TRANS_WB.BOTH_CORES.THIS_AGENT
BUS_TRANS_WB.SELF

Thanks!
0 Kudos
9 Replies
Peter_W_Intel
Employee
594 Views
The user can find helps for events explanation underVTune_DirHelp. pmm.chm is for Intel Core 2 Duo processors, for example.

All BUS_XXX events are for FSB, not for BSB - I think.

Regards, Peter
0 Kudos
TimP
Honored Contributor III
594 Views
The events for Core 2 Duo and Quad are the same. It's not clear why OP thinks some measurement which distinguishes address buss from data buss would be relevant to memory bandwidth analysis, where FSB events are the important ones.
0 Kudos
realbright
Beginner
594 Views
The user can find helps for events explanation underVTune_DirHelp. pmm.chm is for Intel Core 2 Duo processors, for example.

All BUS_XXX events are for FSB, not for BSB - I think.

Regards, Peter

"VTune_DirHelp" - It's looks like Windows directory, but my system is Linux based.
So I can't read that doccument.

Can you tell me, what type of events relevant to BSB ??

Thanks.
0 Kudos
TimP
Honored Contributor III
594 Views
Quoting - realbright
"VTune_DirHelp" - It's looks like Windows directory, but my system is Linux based.
True, the .chm file doesn't come with the linux version, but /vtune/doc/*.pdf should keep you occupied.
Did you read the documents which come up when you search "memory bandwidth utilization" on this forum?
0 Kudos
realbright
Beginner
594 Views
Quoting - tim18
Did you read the documents which come up when you search "memory bandwidth utilization" on this forum?

Why did you mention "memory bandwidth utilization"?

Actually, I wnat to know is Cache to CPU bus utilization. (Back Side Bus)

Can "memory bandwidth utilization" indirectly measure BSB utilization? and How to?

thanks.
0 Kudos
TimP
Honored Contributor III
594 Views
To me, the only reasonable interpretation of some of your allusions was that you were interested in bandwidth utilization. If you didn't mean that, you could have picked your terminology more carefully.
On-chip buss performance isn't dealt with by VTune, at least not for the architectures prior to Core i7, where there are some uncore events which might resemble what you seem to be talking about. Still, there are issues which can't be observed, as far as I know, with any practical developer tools. Anyway, for large practical applications, the memory bandwidth question does assume more importance.
0 Kudos
realbright
Beginner
594 Views
Quoting - tim18
To me, the only reasonable interpretation of some of your allusions was that you were interested in bandwidth utilization. If you didn't mean that, you could have picked your terminology more carefully.
On-chip buss performance isn't dealt with by VTune, at least not for the architectures prior to Core i7, where there are some uncore events which might resemble what you seem to be talking about. Still, there are issues which can't be observed, as far as I know, with any practical developer tools. Anyway, for large practical applications, the memory bandwidth question does assume more importance.

I clearly said BSB utilization. http://en.wikipedia.org/wiki/Back_side_bus
Anyway, currently that is not important to me.

According to your reply, "Vtune can't measure BSB utilization" right?
so can you tell be little bit about "uncore events" or another tools that can measure BSB utilization

thanks.
0 Kudos
TimP
Honored Contributor III
594 Views
Quoting - realbright
can you tell be little bit about "uncore events"
In Vtune Core i7 the Offcore explanation starts out:

About Offcore Performance Tuning Events

These events are devoted to offcore cacheline access activity. Of particular importance is the offcore_response_0 event which is a matrix decomposition of request type by response source. It has the potential of ~65,000 non trivial programmings. There are approximately 275 predefined programmings in the Intel Performance Tuning Tools. The events monitoring the super queue activity are also listed here.


Symbol Name

Event Code

Description

IO_TRANSACTIONS

0x6C

I/O transactions

LONGEST_LAT_CACHE

0x2E

Architectural Event counting Last evel cache Activity

OFFCORE_REQUESTS.L1D_WRITEBACK

0xB0

Offcore L1 data cache writebacks

OFFCORE_REQUESTS_SQ_FULL

0xB2

Offcore requests blocked due to Super Queue full

OFFCORE_RESPONSE_0

0xB7

Offcore response matrix event. see extended documentation

SQ_FULL_STALL_CYCLES

0xF6

Super Queue full stall cycles

SQ_MISC.SPLIT_LOCK

0xF4

Super Queue lock splits across a cache line


I haven't seen any presentations about effective use of these events for software performance tuning. More popular, under precise events, among others, there are
OTHER_CORE_L2_HITM
REMOTE_CACHE_LOCAL_HOME_HIT
REMOTE_DRAM
LOCAL_DRAM
so again you get quickly into memory access issues.

Questions like why dirty fill buffers get backed up by limited port access to L1 still don't have VTune or PTU events associated with them.

If you choose to apply the term back side buss, I guess there are several potential levels for it.
0 Kudos
robert-reed
Valued Contributor II
594 Views
Quoting - realbright

I clearly said BSB utilization. http://en.wikipedia.org/wiki/Back_side_bus
Anyway, currently that is not important to me.

According to your reply, "Vtune can't measure BSB utilization" right?
so can you tell be little bit about "uncore events" or another tools that can measure BSB utilization

thanks.

I looked up the wikipedia article realbright cited and several of the references it draws from and the curious thing I noticed is that the newest reference I found was from 2001, eight years ago. Eight years ago there was such a thing as the Backside Bus. It was an off-chip connection to the last level cache (generally an L2 cache) either static RAM stuck on the baseboard or a daughter card attached to the CPU or a second chip sharing the same die. The back side bus as a distinct entity went away as multiple cache levels came on chip (part of what's called the uncore on Nehalem).

I also found several web references with a totally confused concept of the Dual Independent Bus (DIB), describing it as a structure representing the distinction between front side and back side bus. These explanations are wrong. The DIB is all "front side bus" the split occuring at a chip set that divides the bus in half and plays traffic cop between the two halves with a "snoop filter" to try to limit bus activity by filtering out traffic local to one side or the other.

There are others on this forum with greater expertise on performance events than I. There may be a way to measure bus saturation. I presume your interest in this is because you have an application that isn't scaling as you think it should, and you're trying to verify whether it might be a bus bandwidth issue?

0 Kudos
Reply