hidden text to trigger early load of fonts ПродукцияПродукцияПродукцияПродукция Các sản phẩmCác sản phẩmCác sản phẩmCác sản phẩm المنتجاتالمنتجاتالمنتجاتالمنتجات מוצריםמוצריםמוצריםמוצרים
Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Increasing Stride

schwald
Beginner
993 Views
Hello,
I perform a simple experiment, accessing a constant number of addresses (4096) by chasing a pointer and increasing the stride between the addresses.
Initialization for given stride:
[cpp]int** array_seq_f = NULL; size_t stride; // varied from 1 to 256k size_t size = stride*4097; posix_memalign((void**)&array_seq_f, 4096, sizeof(int*) * size); for(size_t k=0; k<4096;k++) array_seq_f[k*stride] = (int*)&(array_seq_f[(k+1)*stride]); array_seq_f[4096*stride] = NULL;[/cpp]
Measured Execution:
[bash]int* p = array_seq_f[0]; for (size_t i=0; i
I measure the L1 (data), L2 (data), L3 (combined) and TLB misses with PAPIon an Intel Xeon X5650. As expected, the L1 misses are 1 per element with a stride of 8 (equals 64 bytes which is the cachline size). However, with further increasing stride sizes the misses go up to 2 per element at a stride of 32KB. The L2 and L3 misses reach 2 at 128KB.
I am not sure why the misses go up to 2. My assumption is that it has to do with the TLB misses and that the additional data cache misses are induced by accesses to the paging structures. Is there a possibility to confirm this assumption? And why do the L2/L3 misses reach 2 misses/element at 128KB and the L1 misses at 32KB already?
Any help is very much appreciated.
0 Kudos
4 Replies
Patrick_F_Intel1
Employee
993 Views
Hello schwald,
Which L1, L2, L3 and TLB miss events did you use?
I might need the event code and umask for each event since I'm not so familiar with PAPI.
Thanks
Pat
0 Kudos
schwald
Beginner
993 Views
Hello Patrick,
thanks for your quick reply. I am using the PAPI preset events PAPI_L1_DCM, PAPI_L2_DCM, PAPI_L3_TCM and PAPI_TLB_DM. It looks like they map to the native events as follows:
[plain]PAPI_L1_DCM -> L1D:REPL PAPI_L2_DCM -> L2_RQSTS:LD_MISS + L2_RQSTS:RFO_MISS PAPI_L2_DCM -> LAST_LEVEL_CACHE_MISSES PAPI_TLB_DM -> DTLB_MISSES:ANY[/plain]
Is that what you needed?
Thanks,
David
0 Kudos
schwald
Beginner
993 Views
Any help is very much appreciated.
0 Kudos
Patrick_F_Intel1
Employee
993 Views
Hello schwald,
Yes, the TLB misses show up in the L1 misses.
The L1 misses show upas L2 and or L3 misses.
I'm still working through the L2 & L3 side of the story... I think that my tester might be getting some hits in L2 or L3 so I have a little more work to do.
Pat
0 Kudos
Reply