Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

System bus saturation?


I have a 4 Xeon cpu machine(3.0Ghz, IA32 family 15, model2, stepping 6) running Linux2.4 kernel. In a performance test, this machine showed poor scalability vs. 4 cpu Itanium machine.

I want to monitor system bus utilization to see if it's already saturated, since this machine FSB is 3.2GB/s, while the IPF machine is 6.4GB/s.

How can I achieve that with vtune(or other tools if applicable)? Seems I can use event based sampling. I found the Xeon CPU tends to be 'Pentium 4 with SSE3' in vtune 7.2 trial, thus what are the system bus events might show value to me? I catched 'Bus accesses from the processor', if it equals to 'n', since this x86 bus data width is 64bit(8 Byte), the bus transfer rate = n / (samping seconds) * 8 [Byte/s] right? Thus it get only ~100 MB/s.

By the way, 'Clockticks' event means All 4 cpus summarized clocks that CPU not in Halt state right? Name it for 'm', Does it mean I can calculate Busy CPU Hz with the equation:

CPU_BUSY_CYCLEs = m / (samping seconds)

Where can I get more informations about the indications of events I get from Vtune? That means, what and how I can get calculate with these events.

Thx a lot.
0 Kudos
1 Reply

Here's some info we got the last time someone asked for measuring bus bandwidth. It is offered "as-is". That is, don't ask me to explain it, cuz I can't!

Here it is digested for the read case, which is more relevant to performance tuning. For the load and store case you would replace the following events:

Bus Reads Underway From The Processor -> Bus Accesses Underway From The Processor

Reads From The Processor -> Bus Accesses From The Processor

The peak bandwidth for an Intel Pentium 4 processor with an 800 MHz FSB is:

64 bits/sec * 1byte/8bits * 800 cycles/sec = 6.4G bytes/second

To measure consumed memory read bandwidth for Write Back cacheable memory on a 2GHz processor we would measure using VTune analyze:

(Reads From The Processor * 64 bytes * 3G cycles/sec) / (Bus Reads Underway From The Processor with compare bit set)

Breaking this down gives:

Number of bytes the workload read from memory:

Reads From The Processor (read transactions) * 64 bytes/transaction = read bytes all transactions

Average number of seconds that read transactions are underway on the bus:

Bus Reads Underway From The Processor with compare bit set (transactions cycles) / 3G cycles/sec = transactions sec

HINT: Use VTune analyzer to edit the Bus Reads Underway From The Processor event and set the compare bit in the edit event dialog.

0 Kudos