<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Measuring Memory Access Time in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Measuring-Memory-Access-Time/m-p/771518#M213</link>
    <description>Hi,&lt;BR /&gt;&lt;BR /&gt;Have you used any performance counters to profile the cache events?&lt;BR /&gt;&lt;BR /&gt;You can use this tool: &lt;A href="http://software.intel.com/en-us/articles/intel-performance-counter-monitor/"&gt;http://software.intel.com/en-us/articles/intel-performance-counter-monitor/&lt;/A&gt;&lt;BR /&gt;to measure your cache hits/misses at multiple levels. &lt;BR /&gt;&lt;BR /&gt;This should be a useful first step in understanding your code's memory access latencies.&lt;BR /&gt;&lt;BR /&gt;-Hussam</description>
    <pubDate>Tue, 01 May 2012 16:17:44 GMT</pubDate>
    <dc:creator>Hussam_Mousa__Intel_</dc:creator>
    <dc:date>2012-05-01T16:17:44Z</dc:date>
    <item>
      <title>Measuring Memory Access Time</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Measuring-Memory-Access-Time/m-p/771517#M212</link>
      <description>I have an Intel Xeon server running 64 bit linux, on which I am trying to&lt;BR /&gt;measure the cache miss overhead in the L1D cache. The code to do&lt;BR /&gt;this is shown below.&lt;BR /&gt;&lt;BR /&gt;The code uses a &lt;B&gt;buffer[twice the L1D cache size]&lt;/B&gt; and &lt;B&gt;bufindex[2 x associatity of L1D cache]. &lt;BR /&gt;&lt;BR /&gt;&lt;/B&gt;There is a function &lt;B&gt;findsets&lt;/B&gt;, which, given an index &lt;BR /&gt;into buffer, will find all the indices in buffer which map into the same set.&lt;BR /&gt;These indices (call them m0, m1, m2, ....m15) are stored in bufindex.&lt;BR /&gt;&lt;BR /&gt;There is also a function &lt;B&gt;measureflush&lt;/B&gt;, which measures the&lt;BR /&gt;time to access elements from buffer.&lt;BR /&gt;&lt;BR /&gt;I make 4 calls to measureflush as follows.&lt;BR /&gt;r = measureflush(0, 8); /* First Call measures the time to access m0 to m8*/&lt;BR /&gt;r ^= measureflush(0, 8); /* Second Call measures the time to access m0 to m8 */&lt;BR /&gt;r ^= measureflush(9, 16); /* Third Call measures the time to access m9 to m15*/&lt;BR /&gt;r ^= measureflush(0, 8); /* Fourth Call measures the time to access m0 to m8 */&lt;BR /&gt;&lt;BR /&gt;I would expect that due the associativity, the second call to measureflush wouldtake lesser time than the the fourth. However this doesnot seem to be the case, as both calls take roughly the same time (which I thinkmeans that I am getting cache hits in both cases). A sample output is shown below.Where am I going wrong ? Is there something else I need to do in order to see the cache misses ? Any help in this regard will be really useful.&lt;BR /&gt;&lt;BR /&gt;Thanks in advanceChetreb&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;[bash]&lt;BR /&gt;&lt;BR /&gt;Sample Output&lt;BR /&gt;-- first call-- (clearly are compulsory misses)&lt;BR /&gt;0 270&lt;BR /&gt;1 4992&lt;BR /&gt;2 2396&lt;BR /&gt;3 2268&lt;BR /&gt;4 2282&lt;BR /&gt;5 2674&lt;BR /&gt;6 2368&lt;BR /&gt;7 2311&lt;BR /&gt;-- second call--&lt;BR /&gt;0 178&lt;BR /&gt;1 171&lt;BR /&gt;2 178&lt;BR /&gt;3 170&lt;BR /&gt;4 206&lt;BR /&gt;5 206&lt;BR /&gt;6 199&lt;BR /&gt;7 569&lt;BR /&gt;-- third call-- (compulsory misses again)&lt;BR /&gt;9 2979&lt;BR /&gt;10 2261&lt;BR /&gt;11 2382&lt;BR /&gt;12 2631&lt;BR /&gt;13 2268&lt;BR /&gt;14 2262&lt;BR /&gt;15 2290&lt;BR /&gt;-- fourth call--&lt;BR /&gt;0 184&lt;BR /&gt;1 213&lt;BR /&gt;2 477&lt;BR /&gt;3 178&lt;BR /&gt;4 178&lt;BR /&gt;5 178&lt;BR /&gt;6 185&lt;BR /&gt;7 221&lt;BR /&gt;&lt;BR /&gt;[/bash][bash]/*
 * Cache Assumption: 
 *  L1 D : 32 KByte, 8 way set associative, 64 byte cache line
 *  This means that there are 64 sets
 *
 *  Tag size : 20 bits (a32 - a12)
 *  Set size :  6 bits (a11 - a6)
 *  Offset   :  6 bits (a5  - a0)
 */
#include &lt;STDIO.H&gt;
#include&lt;STDINT.H&gt;
#include &lt;STRING.H&gt;

#define MAXSIZE (8192*2)


unsigned int buffer[MAXSIZE];
unsigned int bufindex[16];
unsigned int r;

volatile unsigned measureflush(int from, int to ) 
{
	unsigned int t2, t1;
	int i;
	for (i=from; i&lt;TO&gt;] from memory */
		asm volatile (" xor %%eax, %%eax	\n"
				" cpuid			\n"
				" rdtsc			\n"
				" mov %%eax,%%edi	\n"
				" mov (%%esi),%%ebx \n"
				" xor %%eax,%%eax	\n"
				" cpuid			\n"
				" rdtsc			\n"  	
				" sub %%edi,%%eax  \n"
				" mov %%eax,(%%esi)\n" : "=a"(t2), "=D"(t1) :	"S"(buffer + bufindex&lt;I&gt;) :	"ebx", "ecx", "edx", "cc");
		/* t2 contains the memory access time */
		buffer[bufindex&lt;I&gt;] = t2;
	}


	for (i=from; i&lt;TO&gt;]);
	}
	printf("\n");
	return 0;
}


#define GETSET(x)         ((x &amp;gt;&amp;gt; 6) &amp;amp; 0x3f)
#define GETOFFSET(x)      (x &amp;amp; 0x3f)
#define GETTAG(x)         (x &amp;gt;&amp;gt; 12)


void findsets(int index)
{
	unsigned long bufoffset[16];
	int i;
	


	for (i=0; i &amp;lt; 16; ++i){
		bufoffset&lt;I&gt; = (unsigned long) (buffer + index +  (1024 * i)); 
		bufindex&lt;I&gt; = (1024 * i) + index;
		
		printf("%d\t", bufindex&lt;I&gt;);
		printf("%x\t", bufoffset&lt;I&gt;);
		printf("%x\t", GETSET(bufoffset&lt;I&gt;));
		printf("%x\t", GETOFFSET(bufoffset&lt;I&gt;));
		printf("%x\n", GETTAG(bufoffset&lt;I&gt;));

	}
}


int main()
{

	int i;
	
	findsets(0);

	r = measureflush(0, 8);      /* First Call */
	r ^= measureflush(0, 8);     /* Second Call */
	
	r ^= measureflush(9, 16);    /* Third Call */
	r ^= measureflush(0, 8);     /* Fourth Call */
	return;
}[/bash] &lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/TO&gt;&lt;/I&gt;&lt;/I&gt;&lt;/TO&gt;&lt;/STRING.H&gt;&lt;/STDINT.H&gt;&lt;/STDIO.H&gt;</description>
      <pubDate>Tue, 17 Apr 2012 07:17:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Measuring-Memory-Access-Time/m-p/771517#M212</guid>
      <dc:creator>chetreb</dc:creator>
      <dc:date>2012-04-17T07:17:59Z</dc:date>
    </item>
    <item>
      <title>Measuring Memory Access Time</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Measuring-Memory-Access-Time/m-p/771518#M213</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;Have you used any performance counters to profile the cache events?&lt;BR /&gt;&lt;BR /&gt;You can use this tool: &lt;A href="http://software.intel.com/en-us/articles/intel-performance-counter-monitor/"&gt;http://software.intel.com/en-us/articles/intel-performance-counter-monitor/&lt;/A&gt;&lt;BR /&gt;to measure your cache hits/misses at multiple levels. &lt;BR /&gt;&lt;BR /&gt;This should be a useful first step in understanding your code's memory access latencies.&lt;BR /&gt;&lt;BR /&gt;-Hussam</description>
      <pubDate>Tue, 01 May 2012 16:17:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Measuring-Memory-Access-Time/m-p/771518#M213</guid>
      <dc:creator>Hussam_Mousa__Intel_</dc:creator>
      <dc:date>2012-05-01T16:17:44Z</dc:date>
    </item>
  </channel>
</rss>

