<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Odd cache results in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798521#M643</link>
    <description>&lt;DIV&gt;Hi Roman,&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;Well, each program is different, and the code I posted is the worst case scenario. Let me expain a bit:&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;matrix0 code is a sequential access for the matrix, the code measured is:&lt;/DIV&gt;&lt;DIV&gt;[bash]//Begin of measures
	SystemCounterState before_sstate = getSystemCounterState();
	
	/**
	 * Ejecucion de matriz mala con parallel for y padding simple.
	 */
	
	#pragma omp parallel for shared(m) private(i,j)
	for(i=0;i&lt;F&gt;&lt;J&gt; = sqrt(m&lt;I&gt;&lt;J&gt;);
		}
	}
	
	// End of measures
	// End of custom code
	
	SystemCounterState after_sstate = getSystemCounterState(); [/bash] &lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/F&gt;&lt;/DIV&gt;&lt;DIV&gt;matrix1 code is a non sequential access for the matrix, and the code is the same I posted before:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;[bash]//Begin of measures
	SystemCounterState before_sstate = getSystemCounterState();
	
	/**
	 * Ejecucion de matriz mala con parallel for y padding simple.
	 */
	
	#pragma omp parallel for shared(m) private(i,j)
	for(i=0;i&lt;F&gt;&lt;I&gt; = sqrt(m&lt;J&gt;&lt;I&gt;);
		}
	}
	
	// End of measures
	// End of custom code[/bash] matrix2 code is same than matrix1 but aplying basic array padding not optimized for multi-core (the only change is that matrix column size is slightly greater)&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/F&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;matrix3 code is an experimental method to use array padding to access the matrices by columns so the cache only need to store a single column of the matrix (L2 W is 64bytes/block, so if a double value has 8 bytes long, a m[0][0] access will produce a cache miss and store into L3 block m[0][0] to m[0][7] cells).&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;My algorithm guarantees the maximization of cache size so if cache can store num_threads*num_files*8 matrix cells, the hit ratio should be nearly the same as in a sequential access. But even if my algorithm was bad, matrix0 is a sequential access, so I expect a higher hit ratio in both caches.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;In fact, I use this large matrices so I can obtain more differences between worst case scenario and my algorithm.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;About PAPI question, I use PAPI_L3_TCA and TCM for total cache accesses and misses, and i obtain hit ratio:&lt;/DIV&gt;&lt;DIV&gt;hit_ratio = 1-(misses/acesses). Both events are available and native in my processor, but I don't know any more details.&lt;/DIV&gt;&lt;DIV&gt;I could be using PAPI bad, but L3 results seem to have more sense to me (I can't understand L2 low hit ratio, and that was the main reason for me to change PAPI to IPC)&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks&lt;/DIV&gt;</description>
    <pubDate>Thu, 21 Jun 2012 11:52:57 GMT</pubDate>
    <dc:creator>korso</dc:creator>
    <dc:date>2012-06-21T11:52:57Z</dc:date>
    <item>
      <title>Odd cache results</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798517#M639</link>
      <description>Hi all,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I'm trying to maximise use of cache in matrices, so I'm testing some of my codes with both IPC and PAPI. The problem is that the results obtained are very different. I'm measuring L2 and L3 hit ratio with 4 programs:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;TABLE border="0" cellpadding="0" cellspacing="0" width="400"&gt;
 &lt;COLGROUP&gt;&lt;COL width="80" span="5" /&gt;
 &lt;/COLGROUP&gt;&lt;TBODY&gt;&lt;TR height="20"&gt;
  &lt;TD height="20" width="80"&gt;&lt;/TD&gt;
  &lt;TD colspan="2" class="xl63" width="160"&gt;PAPI&lt;/TD&gt;
  &lt;TD colspan="2" class="xl63" width="160"&gt;IPC&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR height="20"&gt;
  &lt;TD height="20"&gt;&lt;/TD&gt;
  &lt;TD class="xl63"&gt;L2&lt;/TD&gt;
  &lt;TD class="xl63"&gt;L3&lt;/TD&gt;
  &lt;TD class="xl63"&gt;L2&lt;/TD&gt;
  &lt;TD class="xl63"&gt;L3&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR height="20"&gt;
  &lt;TD height="20"&gt;matrix0&lt;/TD&gt;
  &lt;TD align="right"&gt;0,000257&lt;/TD&gt;
  &lt;TD align="right"&gt;0,000394&lt;/TD&gt;
  &lt;TD align="right"&gt;0,0108473&lt;/TD&gt;
  &lt;TD align="right"&gt;0,0274595&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR height="20"&gt;
  &lt;TD height="20"&gt;matrix1&lt;/TD&gt;
  &lt;TD align="right"&gt;0,001641&lt;/TD&gt;
  &lt;TD align="right"&gt;0,590435&lt;/TD&gt;
  &lt;TD align="right"&gt;0,00420045&lt;/TD&gt;
  &lt;TD align="right"&gt;0,0081431&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR height="20"&gt;
  &lt;TD height="20"&gt;matrix2&lt;/TD&gt;
  &lt;TD align="right"&gt;0,001943&lt;/TD&gt;
  &lt;TD align="right"&gt;0,641179&lt;/TD&gt;
  &lt;TD align="right"&gt;0,00416087&lt;/TD&gt;
  &lt;TD align="right"&gt;0,00807843&lt;/TD&gt;
 &lt;/TR&gt;
 &lt;TR height="20"&gt;
  &lt;TD height="20"&gt;matrix3&lt;/TD&gt;
  &lt;TD align="right"&gt;0,001849&lt;/TD&gt;
  &lt;TD align="right"&gt;0,942466&lt;/TD&gt;
  &lt;TD align="right"&gt;0,00388092&lt;/TD&gt;
  &lt;TD align="right"&gt;0,0484803&lt;/TD&gt;
 &lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;/DIV&gt;&lt;DIV&gt;The L3 results are specially significant. With PAPI, I obtained a hit ratio of 60~90%, but when measured with IPC i obtained 0~4%. The routines measured are the same, so I don't undestand the results. Is IPC measuring wrong?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;For example, the code for matrix1 (accesing a matrix by columns):&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;With PAPI:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;[bash]/**
 * Ejecucion de matriz normal en multihilo con openmp utilizando
 * bucle for de openmp. Eventos medidos con PAPI
 */

#include &lt;STDIO.H&gt;
#include &lt;STDLIB.H&gt;
#include &lt;MALLOC.H&gt;
#include &lt;SIGNAL.H&gt;


#include "papi.h"

#define NUM_EVENTS 5

	int Events[NUM_EVENTS] = {PAPI_TOT_CYC, PAPI_L2_TCA, PAPI_L2_TCM, PAPI_L3_TCA, PAPI_L3_TCM};
	long long values[NUM_EVENTS];
	long long start_usec, end_usec, start_v_usec, end_v_usec, start_cycles, end_cycles;
	int EventSet = PAPI_NULL;
	int num_counters;

   const PAPI_hw_info_t *hwinfo = NULL;

int main(int argc, char* argv[])
{
	int n;

	if ((n=PAPI_library_init(PAPI_VER_CURRENT)) != PAPI_VER_CURRENT) {
		printf("\n Papi ver current (%d) distinto de %d \n", n,PAPI_VER_CURRENT);
	}

	/* Gets the starting time in microseconds */
	if ((hwinfo = PAPI_get_hardware_info()) == NULL) {
		printf("\n Papi: Error PAPI_get_hardware_info null\n");
	}
	else {
		printf("\n%d CPU at %f Mhz.\n",hwinfo-&amp;gt;totalcpus,hwinfo-&amp;gt;mhz);
	}
	
	// CODIGO A MEDIR

	#include &lt;OMP.H&gt;
	#include &lt;STDIO.H&gt;
	#include &lt;STDLIB.H&gt;
	#include "timer.h"
	#include &lt;MATH.H&gt;

	#define nth 4
	#define F 17000
	#define C 17000
	#define VAR double

	int i,j;

	// Seleccionamos el numero de hilos de ejecucion
	omp_set_num_threads(nth);

	// Reservamos matriz
	VAR** m;
	m = (VAR**)malloc(F*sizeof(VAR*));
	for(i = 0; i&lt;F&gt; = (VAR*)malloc(C*sizeof(VAR));
	}

	// Inicializacion matriz
	for(i=0;i&lt;F&gt;&lt;J&gt; = 100+i+j;
		}
	}

	start_timer(0);
	/* Start counting events */
	if ((n=PAPI_start_counters(Events, NUM_EVENTS)) != PAPI_OK)
		printf("\n Error %d: PAPI_start_counters\n",n);

	#pragma omp parallel for shared(m) private(i,j)
	for(i=0;i&lt;C&gt;&lt;I&gt; = (VAR)sqrt(m&lt;J&gt;&lt;I&gt;);
		}
	}

	stop_timer(0);
	printf("No padding. Execution time: ");
	print_timer(0);
  


	// FIN DE CODIGO A MEDIR

	if ((n=PAPI_stop_counters(values, NUM_EVENTS)) != PAPI_OK)
		printf("\n Error %d : PAPI_stop_counters\n", n);

	printf("Total Cycles: \t%lld\n", values[0]);
	printf("\nPAPI:\n");
	printf("L2 Data Accesses:\t%lld\nL2 Data Misses:\t\t%lld\n", values[1], values[2]);
	printf("\nL3 Accesses:\t\t%lld\nL3 Data Misses:\t\t%lld\n", values[3], values[4]);
	printf("\nL2 Success Rate:\t%lf\n", 1-((double)values[2]/(double)values[1]));
	printf("L3 Success Rate:\t%lf\n", 1-((double)values[4]/(double)values[3]));
	
	return 0;
}
[/bash] &lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/C&gt;&lt;/J&gt;&lt;/F&gt;&lt;/F&gt;&lt;/MATH.H&gt;&lt;/STDLIB.H&gt;&lt;/STDIO.H&gt;&lt;/OMP.H&gt;&lt;/SIGNAL.H&gt;&lt;/MALLOC.H&gt;&lt;/STDLIB.H&gt;&lt;/STDIO.H&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;With IPC:&lt;/DIV&gt;&lt;DIV&gt;[bash]#include "cpucounters.h"
#include &lt;OMP.H&gt;
#include &lt;STDIO.H&gt;
#include &lt;STDLIB.H&gt;
#include &lt;MATH.H&gt;

#define nth 4
#define F 17000
#define C 17000

using namespace std;

int
main(){
	cout&amp;lt;&amp;lt;"Testing Intel PCM\n"&amp;lt;&lt;ENDL&gt;program() != PCM::Success){
		printf("Error Code: %d\n",ipc-&amp;gt;program());
		return -1;  
	} 
	
	// Begin of custom code
	
	int i,j;
	
	// Reservamos matriz
	double** m;
	m = (double**)malloc(F*sizeof(double));
	for(i = 0; i&lt;F&gt; = (double*)malloc(C*sizeof(double));
	}
	
	// Inicializacion matriz
	for(i=0;i&lt;F&gt;&lt;J&gt; = 100+i+j;
		}
	}
	
	//Begin of measures
	SystemCounterState before_sstate = getSystemCounterState();
	
	/**
	 * Ejecucion de matriz mala con parallel for.
	 */
	
	#pragma omp parallel for shared(m) private(i,j)
	for(i=0;i&lt;C&gt;&lt;I&gt; = sqrt(m&lt;J&gt;&lt;I&gt;);
		}
	}
	
	// End of measures
	// End of custom code
	
	SystemCounterState after_sstate = getSystemCounterState();  
	
	// Stop and detach PMU (IMPORTANT!!)
	ipc-&amp;gt;cleanup();
  
	cout &amp;lt;&lt;ENDL&gt;&amp;lt;&amp;lt; "RESULTS:"&amp;lt;&lt;ENDL&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I'm measuring wrong? The results I obtain with PAPI have more coherence for me (at least in L3)&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks in advance.&lt;/DIV&gt;&lt;/ENDL&gt;&lt;/ENDL&gt;&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/C&gt;&lt;/J&gt;&lt;/F&gt;&lt;/F&gt;&lt;/ENDL&gt;&lt;/MATH.H&gt;&lt;/STDLIB.H&gt;&lt;/STDIO.H&gt;&lt;/OMP.H&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 20 Jun 2012 12:52:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798517#M639</guid>
      <dc:creator>korso</dc:creator>
      <dc:date>2012-06-20T12:52:33Z</dc:date>
    </item>
    <item>
      <title>Odd cache results</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798518#M640</link>
      <description>korso,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;what are the sizes of your matrices (0,1,2,3) and what is the hardware configuration are running (number of sockets, processor type, etc) ?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks,&lt;/DIV&gt;&lt;DIV&gt;Roman&lt;/DIV&gt;</description>
      <pubDate>Thu, 21 Jun 2012 09:02:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798518#M640</guid>
      <dc:creator>Roman_D_Intel</dc:creator>
      <dc:date>2012-06-21T09:02:06Z</dc:date>
    </item>
    <item>
      <title>Odd cache results</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798519#M641</link>
      <description>Hi Roman,&lt;DIV&gt;Matrices size are 17000x17000 in all codes. C Double type. They have been selected for not reaching RAM limit (and avoid using virtual memory)&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;My processor is ai7 CPU 860 @ 2.80GHz&lt;/DIV&gt;&lt;DIV&gt;It has a 3 level cache:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;L1 -&amp;gt; C=64;   L=8;  W=64 -&amp;gt; 32K instructions, 32K data (per core)&lt;/DIV&gt;&lt;DIV&gt;L2 -&amp;gt; C=512;  L=8;  W=64 -&amp;gt; 256K (per core)&lt;/DIV&gt;&lt;DIV&gt;L3 -&amp;gt; C=8192; L=16; W=64 -&amp;gt; 8192K (unified)&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Sockets -&amp;gt; 1&lt;/DIV&gt;&lt;DIV&gt;Cores -&amp;gt; 4&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;RAM -&amp;gt; 4GB&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;If you need any other information, just ask for it.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks.&lt;/DIV&gt;</description>
      <pubDate>Thu, 21 Jun 2012 10:41:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798519#M641</guid>
      <dc:creator>korso</dc:creator>
      <dc:date>2012-06-21T10:41:23Z</dc:date>
    </item>
    <item>
      <title>Odd cache results</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798520#M642</link>
      <description>Korso,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;do you know how PAPI maps it "virtual"PAPI_L3_TCA, PAPI_L3_TCM events to real hardware event and what are those?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;17K x 17K x 8 matrix implies data size &amp;gt;= 2GByte and the L3 cache size is only 8 MByte. Your access pattern (by column - increasing j index) is not sequential. Why do you expect L3 hit rate &amp;gt; 60% ?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;OL class="dp-sh"&gt;&lt;LI&gt;    for(i=0;&lt;C&gt;
&lt;/C&gt;&lt;/LI&gt;&lt;LI class="alt"&gt;        for(j=0;&lt;F&gt;
&lt;/F&gt;&lt;/LI&gt;&lt;LI&gt;            m&lt;J&gt;&lt;I&gt; = (VAR)sqrt(m&lt;J&gt;&lt;I&gt;);   &lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/LI&gt;
&lt;LI class="alt"&gt;        }   &lt;/LI&gt;
&lt;LI&gt;    }&lt;/LI&gt;&lt;/OL&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks,&lt;/DIV&gt;&lt;DIV&gt;Roman&lt;/DIV&gt;</description>
      <pubDate>Thu, 21 Jun 2012 11:11:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798520#M642</guid>
      <dc:creator>Roman_D_Intel</dc:creator>
      <dc:date>2012-06-21T11:11:27Z</dc:date>
    </item>
    <item>
      <title>Odd cache results</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798521#M643</link>
      <description>&lt;DIV&gt;Hi Roman,&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;Well, each program is different, and the code I posted is the worst case scenario. Let me expain a bit:&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;matrix0 code is a sequential access for the matrix, the code measured is:&lt;/DIV&gt;&lt;DIV&gt;[bash]//Begin of measures
	SystemCounterState before_sstate = getSystemCounterState();
	
	/**
	 * Ejecucion de matriz mala con parallel for y padding simple.
	 */
	
	#pragma omp parallel for shared(m) private(i,j)
	for(i=0;i&lt;F&gt;&lt;J&gt; = sqrt(m&lt;I&gt;&lt;J&gt;);
		}
	}
	
	// End of measures
	// End of custom code
	
	SystemCounterState after_sstate = getSystemCounterState(); [/bash] &lt;/J&gt;&lt;/I&gt;&lt;/J&gt;&lt;/F&gt;&lt;/DIV&gt;&lt;DIV&gt;matrix1 code is a non sequential access for the matrix, and the code is the same I posted before:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;[bash]//Begin of measures
	SystemCounterState before_sstate = getSystemCounterState();
	
	/**
	 * Ejecucion de matriz mala con parallel for y padding simple.
	 */
	
	#pragma omp parallel for shared(m) private(i,j)
	for(i=0;i&lt;F&gt;&lt;I&gt; = sqrt(m&lt;J&gt;&lt;I&gt;);
		}
	}
	
	// End of measures
	// End of custom code[/bash] matrix2 code is same than matrix1 but aplying basic array padding not optimized for multi-core (the only change is that matrix column size is slightly greater)&lt;/I&gt;&lt;/J&gt;&lt;/I&gt;&lt;/F&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;matrix3 code is an experimental method to use array padding to access the matrices by columns so the cache only need to store a single column of the matrix (L2 W is 64bytes/block, so if a double value has 8 bytes long, a m[0][0] access will produce a cache miss and store into L3 block m[0][0] to m[0][7] cells).&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;My algorithm guarantees the maximization of cache size so if cache can store num_threads*num_files*8 matrix cells, the hit ratio should be nearly the same as in a sequential access. But even if my algorithm was bad, matrix0 is a sequential access, so I expect a higher hit ratio in both caches.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;In fact, I use this large matrices so I can obtain more differences between worst case scenario and my algorithm.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;About PAPI question, I use PAPI_L3_TCA and TCM for total cache accesses and misses, and i obtain hit ratio:&lt;/DIV&gt;&lt;DIV&gt;hit_ratio = 1-(misses/acesses). Both events are available and native in my processor, but I don't know any more details.&lt;/DIV&gt;&lt;DIV&gt;I could be using PAPI bad, but L3 results seem to have more sense to me (I can't understand L2 low hit ratio, and that was the main reason for me to change PAPI to IPC)&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks&lt;/DIV&gt;</description>
      <pubDate>Thu, 21 Jun 2012 11:52:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798521#M643</guid>
      <dc:creator>korso</dc:creator>
      <dc:date>2012-06-21T11:52:57Z</dc:date>
    </item>
    <item>
      <title>Odd cache results</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798522#M644</link>
      <description>Replying to reupload the post...</description>
      <pubDate>Tue, 26 Jun 2012 09:10:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798522#M644</guid>
      <dc:creator>korso</dc:creator>
      <dc:date>2012-06-26T09:10:43Z</dc:date>
    </item>
    <item>
      <title>Odd cache results</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798523#M645</link>
      <description>Hi korso,&lt;BR /&gt;&lt;BR /&gt;Do you know howPAPI_L3_TCA and PAPI_L3_TCA PAPI generic events are mapped to the low-level Intel event (names)? As far as I understanf PAPI mappingscould be dependent on the PAPI version and also underlying CPU architecture. Is there utility in PAPI that can output such mapping on your particular system? Or any documentation?&lt;BR /&gt;&lt;BR /&gt;It would be also useful to see and compare the absolute counts of L2/L3 cache hits and misses in PCM and PAPI. Could you post them here?&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;BR /&gt;Roman</description>
      <pubDate>Fri, 29 Jun 2012 13:39:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798523#M645</guid>
      <dc:creator>Roman_D_Intel</dc:creator>
      <dc:date>2012-06-29T13:39:29Z</dc:date>
    </item>
    <item>
      <title>Odd cache results</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798524#M646</link>
      <description>Hi korso,&lt;BR /&gt;&lt;BR /&gt;Assuming your OS is Linux, you can use "perf" utility as a 3rd method to check, without having to modify source code. Simply run, on the command line:&lt;BR /&gt;&lt;BR /&gt;&amp;gt; sudo perf stat -e rXXXX,rYYYY,rZZZZ,... ./&lt;YOUR_PROG&gt; &lt;ARG1&gt; &lt;ARG2&gt; ...&lt;BR /&gt;&lt;BR /&gt;where the rXXXX etc are the hex codes formed by Umask and EventCode of relevant cache events. The Intel Programming Guide (Volume 3B), Chapters 18, 19 on Performance Counters give the event codes for your processor (Core i7) nehalem.&lt;BR /&gt;&lt;BR /&gt;Or, have you already done it?&lt;BR /&gt;&lt;BR /&gt;Sanath&lt;/ARG2&gt;&lt;/ARG1&gt;&lt;/YOUR_PROG&gt;</description>
      <pubDate>Fri, 29 Jun 2012 20:55:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Odd-cache-results/m-p/798524#M646</guid>
      <dc:creator>Sanath_Jayasena</dc:creator>
      <dc:date>2012-06-29T20:55:49Z</dc:date>
    </item>
  </channel>
</rss>

