I have a 4 Xeon cpu machine(3.0Ghz, IA32 family 15, model2, stepping 6) running Linux2.4 kernel. In a performance test, this machine showed poor scalability vs. 4 cpu Itanium machine.
I want to monitor system bus utilization to see if it's already saturated, since this machine FSB is 3.2GB/s, while the IPF machine is 6.4GB/s.
How can I achieve that with vtune(or other tools if applicable)? Seems I can use event based sampling. I found the Xeon CPU tends to be 'Pentium 4 with SSE3' in vtune 7.2 trial, thus what are the system bus events might show value to me? I catched 'Bus accesses from the processor', if it equals to 'n', since this x86 bus data width is 64bit(8 Byte), the bus transfer rate = n / (samping seconds) * 8 [Byte/s] right? Thus it get only ~100 MB/s.
By the way, 'Clockticks' event means All 4 cpus summarized clocks that CPU not in Halt state right? Name it for 'm', Does it mean I can calculate Busy CPU Hz with the equation:
CPU_BUSY_CYCLEs = m / (samping seconds)
Where can I get more informations about the indications of events I get from Vtune? That means, what and how I can get calculate with these events.
Thx a lot.
Here's some info we got the last time someone asked for measuring bus bandwidth. It is offered "as-is". That is, don't ask me to explain it, cuz I can't!
Here it is digested for the read case, which is more relevant to performance tuning. For the load and store case you would replace the following events:
Bus Reads Underway From The Processor -> Bus Accesses Underway From The Processor
Reads From The Processor -> Bus Accesses From The Processor
The peak bandwidth for an Intel Pentium 4 processor with an 800 MHz FSB is:
64 bits/sec * 1byte/8bits * 800 cycles/sec = 6.4G bytes/second
To measure consumed memory read bandwidth for Write Back cacheable memory on a 2GHz processor we would measure using VTune analyze:
(Reads From The Processor * 64 bytes * 3G cycles/sec) / (Bus Reads Underway From The Processor with compare bit set)
Breaking this down gives:
Number of bytes the workload read from memory:
Reads From The Processor (read transactions) * 64 bytes/transaction = read bytes all transactions
Average number of seconds that read transactions are underway on the bus:
Bus Reads Underway From The Processor with compare bit set (transactions cycles) / 3G cycles/sec = transactions sec
HINT: Use VTune analyzer to edit the Bus Reads Underway From The Processor event and set the compare bit in the edit event dialog.