- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, all
I'm new in computer architecture, I have a question may look silly.
I use lmbench's bw_mem tool to test mem bandwidth. my testbed is a 4 sockets(E7-4870) system populated 64 8GB dimms.
Using numactl to bind node 0 read node 1's memory. the test result is 11GB/s. And I using PCM to monitor CPU activities. Found L3MISS is 88M and MC READ is 12.53GB.
IMHO the llc misses should equals to memory access, so 64B(cache line size) * L3miss should equal to mem bandwidth. But the test result is 64B * 88M = 5.632GB << 12.53GB, what am i missing here?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you mean total memory bandwidth?It could also contain prefetched accesses and cache hits
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Li,
I'm guessing that the hardware prefetcher is pulling the data into the L3 before the cacheline is accessed. So the L3 doesn't see the 'miss'. This is the purpose of the prefetchers.
The lmbench rd benchmark reads every 4th integer. If it read just 1 integer per cacheline (and if the generated assembly code is effiicent) then you might get more L3 misses.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you all, I understand LLC miss and memory access more now.
I turn the prefetchers off the result seems more resonable. On another 2-sockets system LLC miss is 145M, MC READ is 8.66GB. 145M*64 = 9280M

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page