- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I perform a simple experiment, accessing a constant number of addresses (4096) by chasing a pointer and increasing the stride between the addresses.
Initialization for given stride:
[cpp]int** array_seq_f = NULL;
size_t stride; // varied from 1 to 256k
size_t size = stride*4097;
posix_memalign((void**)&array_seq_f, 4096, sizeof(int*) * size);
for(size_t k=0; k<4096;k++)
array_seq_f[k*stride] = (int*)&(array_seq_f[(k+1)*stride]);
array_seq_f[4096*stride] = NULL;[/cpp]
Measured Execution:
[bash]int* p = array_seq_f[0];
for (size_t i=0; i
I measure the L1 (data), L2 (data), L3 (combined) and TLB misses with PAPIon an Intel Xeon X5650. As expected, the L1 misses are 1 per element with a stride of 8 (equals 64 bytes which is the cachline size). However, with further increasing stride sizes the misses go up to 2 per element at a stride of 32KB. The L2 and L3 misses reach 2 at 128KB.
I am not sure why the misses go up to 2. My assumption is that it has to do with the TLB misses and that the additional data cache misses are induced by accesses to the paging structures. Is there a possibility to confirm this assumption? And why do the L2/L3 misses reach 2 misses/element at 128KB and the L1 misses at 32KB already?
Any help is very much appreciated.
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello schwald,
Which L1, L2, L3 and TLB miss events did you use?
I might need the event code and umask for each event since I'm not so familiar with PAPI.
Thanks
Pat
Which L1, L2, L3 and TLB miss events did you use?
I might need the event code and umask for each event since I'm not so familiar with PAPI.
Thanks
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Patrick,
thanks for your quick reply. I am using the PAPI preset events PAPI_L1_DCM, PAPI_L2_DCM, PAPI_L3_TCM and PAPI_TLB_DM. It looks like they map to the native events as follows:
[plain]PAPI_L1_DCM -> L1D:REPL
PAPI_L2_DCM -> L2_RQSTS:LD_MISS + L2_RQSTS:RFO_MISS
PAPI_L2_DCM -> LAST_LEVEL_CACHE_MISSES
PAPI_TLB_DM -> DTLB_MISSES:ANY[/plain]
Is that what you needed?
Thanks,
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello schwald,
Yes, the TLB misses show up in the L1 misses.
The L1 misses show upas L2 and or L3 misses.
I'm still working through the L2 & L3 side of the story... I think that my tester might be getting some hits in L2 or L3 so I have a little more work to do.
Pat
Yes, the TLB misses show up in the L1 misses.
The L1 misses show upas L2 and or L3 misses.
I'm still working through the L2 & L3 side of the story... I think that my tester might be getting some hits in L2 or L3 so I have a little more work to do.
Pat
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page