Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

pcm and memory bandwidth

paul_z_
Beginner
783 Views

When looking at the intelpcm (rel 2.4) source, I see that PCM counters on memory bandwidth are not available for Windows and Sandy bridge/Ivy bridge architectures. Can someone recommend a non-invasive approach to measure those ?

0 Kudos
4 Replies
Bernard
Valued Contributor I
783 Views

>>>Can someone recommend a non-invasive approach to measure those>>>

Whast do you mean by non-invasive approach?

0 Kudos
Patrick_F_Intel1
Employee
783 Views

Below is a reply to a similar question on this forum. You can use VTune Amplifier to measure the bandwidth on sandybridge and ivybridge.

VTune also has the MMIO bandwidth counters for linux and windows (which PCM added in v2.4 for just linux).

On sandybridge, you can use the uncore events:
UNC_ARB_TRK_REQUESTS.WRITES # works for rfo (read for ownership) and nontemporal stores. evt num 0x81, umask 0x20, uncore unit=ARB
UNC_ARB_TRK_REQUESTS.EVICTIONS # works for wriiteback, evt num 0x81, umask= 0x80, uncore unit= ARB
UNC_CBO_CACHE_LOOKUP.ANY_I # works for reads and rfo and nontemporal stores, evt num 0x34, umask 0x88, uncore unit= cbox

These count full cache line transfers (so the number of bytes moved is 64 * event count).

There is one 1 CBOX unit per core so you can get the memory reads per core.
There is only 1 ARB unit per processor so you don't get the writebacks per core... just a total for the processor.
The formula would be total memory bw due to the cores is =
64 * (UNC_ARB_TRK_REQUESTS.EVICTIONS + UNC_CBO_CACHE_LOOKUP.ANY_I ) / elapsed_time

0 Kudos
Patrick_F_Intel1
Employee
783 Views

I have verified that the uncore events below are in ivybridge as well.

UNC_ARB_TRK_REQUESTS.WRITES # works for rfo (read for ownership) and nontemporal stores. evt num 0x81, umask 0x20, uncore unit=ARB UNC_ARB_TRK_REQUESTS.EVICTIONS # works for wriiteback, evt num 0x81, umask= 0x80, uncore unit= ARB UNC_CBO_CACHE_LOOKUP.ANY_I # works for reads and rfo and nontemporal stores, evt num 0x34, umask 0x88, uncore unit= cbox

These count full cache line transfers (so the number of bytes moved is 64 * event count).

There is one 1 CBOX unit per core so you can get the memory reads per core. There is only 1 ARB unit per processor so you don't get the writebacks per core... just a total for the processor. The formula would be total memory bw due to the cores is = 64 * (UNC_ARB_TRK_REQUESTS.EVICTIONS + UNC_CBO_CACHE_LOOKUP.ANY_I ) / elapsed_time

Top

 

0 Kudos
Roman_D_Intel
Employee
782 Views

Hi,

Intel PCM V2.8 now supports memory bandwidth metrics on your processor also in Windows (via winpmem driver).

Best regards,

Roman

0 Kudos
Reply