<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic [ Run-Time testing - Extended in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090185#M65012</link>
    <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - No ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ Microsoft C++ compiler ]&lt;/STRONG&gt;

		...
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		...

		&lt;STRONG&gt;Here are generated binary codes&lt;/STRONG&gt;:

		...
		00244486  rdtsc
		00244488  clflush     [ebp-300h]
		0024448F  clflush     [ebp-240h]
		00244496  clflush     [ebp-180h]
		0024449D  mov         dword ptr [ebp-48h], eax
		002444A0  clflush     [ebp-340h]
		002444A7  clflush     [ebp-280h]
		002444AE  clflush     [ebp-1C0h]
		002444B5  clflush     [ebp-100h]
		002444BC  mov         dword ptr [ebp-44h], edx
		002444BF  clflush     [ebp-2C0h]
		002444C6  clflush     [ebp-200h]
		002444CD  clflush     [ebp-140h]
		002444D4  rdtsc
		...</description>
    <pubDate>Sat, 24 Sep 2016 00:51:29 GMT</pubDate>
    <dc:creator>SergeyKostrov</dc:creator>
    <dc:date>2016-09-24T00:51:29Z</dc:date>
    <item>
      <title>Latency and Throughput of Intel CPUs 'clflush' instruction</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090178#M65005</link>
      <description>&lt;STRONG&gt;*** Latency and Throughput of Intel CPUs 'clflush' instruction ***&lt;/STRONG&gt;</description>
      <pubDate>Fri, 23 Sep 2016 15:36:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090178#M65005</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-23T15:36:23Z</dc:date>
    </item>
    <item>
      <title>[ Abstract ]</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090179#M65006</link>
      <description>&lt;STRONG&gt;[ Abstract ]&lt;/STRONG&gt;

Latency and Throughput of Intel CPUs &lt;STRONG&gt;clflush&lt;/STRONG&gt; instruction.

Introduced with &lt;STRONG&gt;SSE2&lt;/STRONG&gt; &lt;STRONG&gt;IRT-Domain&lt;/STRONG&gt; and is an instruction with a speculative execution. It is a real challenge to measure &lt;STRONG&gt;clflush&lt;/STRONG&gt; instruction latency because it is up to a CPU when to actually execute it.

	&lt;STRONG&gt;IRT-Domain&lt;/STRONG&gt; - &lt;STRONG&gt;SSE2&lt;/STRONG&gt; - [ &lt;STRONG&gt;emmintrin.h&lt;/STRONG&gt; ]

	...
	extern void __ICL_INTRINCC &lt;STRONG&gt;_mm_clflush&lt;/STRONG&gt;( void const *p );
	...

	&lt;STRONG&gt;IRT&lt;/STRONG&gt; - Intrinsics Run-Time</description>
      <pubDate>Fri, 23 Sep 2016 15:46:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090179#M65006</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-23T15:46:46Z</dc:date>
    </item>
    <item>
      <title>[ Here are notes related to</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090180#M65007</link>
      <description>&lt;STRONG&gt;[ Here are notes related to objectives of an investigation ( a small R&amp;amp;D work ) ]&lt;/STRONG&gt;

	&lt;STRONG&gt;1&lt;/STRONG&gt;. Intel does Not provide any numbers for the latency of CLFLUSH instruction.

	&lt;STRONG&gt;2&lt;/STRONG&gt;. Discussions about the latency of CLFLUSH instruction are highly speculative because it is
	   Not clear when the instruction is actually executed.

	&lt;STRONG&gt;3&lt;/STRONG&gt;. Some discussions about the latency of CLFLUSH instruction do Not take into account that
	   it flushes data into the main memory ( RAM ) and its latency is usually known. It is Not
	   clear when a cache line really becomes available for another hardware or software prefetch of
	   data or a set of instructions, and if it becomes available before (!) the main memory is
	   updated with a modified data.

	&lt;STRONG&gt;4&lt;/STRONG&gt;. It is more important to understand how as effective as possible binary codes could be
	   generated by C++ compilers in order to achieve the highest throughput of a set of CLFLUSH
	   instructions.

	&lt;STRONG&gt;5&lt;/STRONG&gt;. It is shown later that ineffective binary codes generation by a C++ compiler could affect
	   throughput of a set of CLFLUSH instructions.

	&lt;STRONG&gt;6&lt;/STRONG&gt;. Three types of binary code generations are possible and they are as follows:

	   - Type-1: Based on '&lt;STRONG&gt;clflush [ebp-offset]&lt;/STRONG&gt;' instruction using a general purpose register 'ebp'

	   - Type-2: Based on '&lt;STRONG&gt;clflush [eXx]&lt;/STRONG&gt;' instruction using a general purpose register 'eXx'

	   - Type-3: Composite when '&lt;STRONG&gt;clflush&lt;/STRONG&gt;' instruction is generated in a small Not inline function</description>
      <pubDate>Sat, 24 Sep 2016 00:32:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090180#M65007</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T00:32:00Z</dc:date>
    </item>
    <item>
      <title>[ Intel CLFLUSH instruction</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090181#M65008</link>
      <description>&lt;STRONG&gt;[ Intel CLFLUSH instruction Opcodes ]&lt;/STRONG&gt;

		0F AE 38................clflush     [eax]
		0F AE 3B................clflush     [ebx]
		0F AE 39................clflush     [ecx]
		0F AE 3A................clflush     [edx]
		0F AE BD [offset]....clflush     [ebp-offset]</description>
      <pubDate>Sat, 24 Sep 2016 00:36:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090181#M65008</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T00:36:58Z</dc:date>
    </item>
    <item>
      <title>[ Test Case - IrtClflush &amp;</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090182#M65009</link>
      <description>&lt;STRONG&gt;[ Test Case - IrtClflush &amp;amp; CrtClflush ]&lt;/STRONG&gt;

		...
		RTint piAddress[10][16] =
		{
			{ 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11 },		// 0
			{ 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22 },		// 1
			{ 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33 },		// 2
			{ 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44 },		// 3
			{ 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77 },		// 4
			{ 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88, 0x88 },		// 5
			{ 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44, 0x44 },		// 6
			{ 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, 0x33 },		// 7
			{ 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22, 0x22 },		// 8
			{ 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11 },		// 9
		};

		IrtClflush( &amp;amp;piAddress[0][0] );
		CrtClflush( &amp;amp;piAddress[1][0] );

		CrtSetThreadPriority( THREADPRIORITY_REALTIME );

		CrtPrefetchData( ( RTchar * )&amp;amp;piAddress[0][0] );	// All prefetches are T0-type
		CrtPrefetchData( ( RTchar * )&amp;amp;piAddress[1][0] );
		CrtPrefetchData( ( RTchar * )&amp;amp;piAddress[2][0] );
		CrtPrefetchData( ( RTchar * )&amp;amp;piAddress[3][0] );
		CrtPrefetchData( ( RTchar * )&amp;amp;piAddress[4][0] );
		CrtPrefetchData( ( RTchar * )&amp;amp;piAddress[5][0] );
		CrtPrefetchData( ( RTchar * )&amp;amp;piAddress[6][0] );
		CrtPrefetchData( ( RTchar * )&amp;amp;piAddress[7][0] );
		CrtPrefetchData( ( RTchar * )&amp;amp;piAddress[8][0] );
		CrtPrefetchData( ( RTchar * )&amp;amp;piAddress[9][0] );

		RTuint64 uiClock1 = CrtRdtsc();

		CrtClflush( &amp;amp;piAddress[0][0] );
		CrtClflush( &amp;amp;piAddress[1][0] );
		CrtClflush( &amp;amp;piAddress[2][0] );
		CrtClflush( &amp;amp;piAddress[3][0] );
		CrtClflush( &amp;amp;piAddress[4][0] );
		CrtClflush( &amp;amp;piAddress[5][0] );
		CrtClflush( &amp;amp;piAddress[6][0] );
		CrtClflush( &amp;amp;piAddress[7][0] );
		CrtClflush( &amp;amp;piAddress[8][0] );
		CrtClflush( &amp;amp;piAddress[9][0] );

		RTuint64 uiClock2 = CrtRdtsc();

		CrtPrintf( RTU("[ CrtClflush ] - Executed in %u clock cycles\n"),
				   ( RTuint )( uiClock2 - uiClock1 ) / 10 );

		CrtSetThreadPriority( THREADPRIORITY_NORMAL );

		CrtPrintf( RTU("IrtClflush &amp;amp; CrtClflush\n") );
		...</description>
      <pubDate>Sat, 24 Sep 2016 00:40:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090182#M65009</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T00:40:13Z</dc:date>
    </item>
    <item>
      <title>[ Watcom C++ compiler -</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090183#M65010</link>
      <description>&lt;STRONG&gt;[ Watcom C++ compiler - Generated binary codes - No re-ordering of instructions ]&lt;/STRONG&gt;

		...
		00403737  lea         eax, [ebp-8AEh]
		0040373D  prefetcht0  [eax]
		00403740  lea         eax, [ebp-86Eh]
		00403746  prefetcht0  [eax]
		00403749  lea         eax, [ebp-82Eh]
		0040374F  prefetcht0  [eax]
		00403752  lea         eax, [ebp-7EEh]
		00403758  prefetcht0  [eax]
		0040375B  lea         eax, [ebp-7AEh]
		00403761  prefetcht0  [eax]
		00403764  lea         eax, [ebp-76Eh]
		0040376A  prefetcht0  [eax]
		0040376D  lea         eax, [ebp-72Eh]
		00403773  prefetcht0  [eax]
		00403776  lea         eax, [ebp-6EEh]
		0040377C  prefetcht0  [eax]
		0040377F  lea         eax, [ebp-6AEh]
		00403785  prefetcht0  [eax]
		00403788  lea         eax, [ebp-66Eh]
		0040378E  prefetcht0  [eax]
		00403791  rdtsc
		00403793  mov         ecx, eax
		00403795  lea         eax, [ebp-8AEh]
		0040379B  clflush     [eax]
		0040379E  lea         eax, [ebp-86Eh]
		004037A4  clflush     [eax]
		004037A7  lea         eax, [ebp-82Eh]
		004037AD  clflush     [eax]
		004037B0  lea         eax, [ebp-7EEh]
		004037B6  clflush     [eax]
		004037B9  lea         eax, [ebp-7AEh]
		004037BF  clflush     [eax]
		004037C2  lea         eax, [ebp-76Eh]
		004037C8  clflush     [eax]
		004037CB  lea         eax, [ebp-72Eh]
		004037D1  clflush     [eax]
		004037D4  lea         eax, [ebp-6EEh]
		004037DA  clflush     [eax]
		004037DD  lea         eax, [ebp-6AEh]
		004037E3  clflush     [eax]
		004037E6  lea         eax, [ebp-66Eh]
		004037EC  clflush     [eax]
		004037EF  rdtsc
		004037F1  xor         edx, edx
		004037F3  sub         eax, ecx
		...</description>
      <pubDate>Sat, 24 Sep 2016 00:43:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090183#M65010</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T00:43:53Z</dc:date>
    </item>
    <item>
      <title>[ C++ compilers generated</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090184#M65011</link>
      <description>&lt;STRONG&gt;[ C++ compilers generated binary codes - Short Summary ]&lt;/STRONG&gt;

		&lt;STRONG&gt;[ Microsoft C++ compiler ]&lt;/STRONG&gt;

			&lt;STRONG&gt;A&lt;/STRONG&gt; - optimized

			...
			clflush	[ebp-100h]
			...

			&lt;STRONG&gt;B&lt;/STRONG&gt; - non-optimized

			...
			mov	eax, dword ptr [ebp+8]
			clflush	[eax]
			...

		&lt;STRONG&gt;[ Borland C++ compiler ]&lt;/STRONG&gt;

			&lt;STRONG&gt;A&lt;/STRONG&gt; - optimized

			...
			mov	edx, dword ptr [ebp-3D0h]
			clflush	[edx]
			...

			&lt;STRONG&gt;B&lt;/STRONG&gt; - non-optimized ( in a small Not inline function )

			...
			push        ebp
			mov         ebp, esp
			mov         eax, dword ptr [ebp+8]
			clflush     [eax]
			pop         ebp
			ret
			...

		&lt;STRONG&gt;[ Intel C++ compiler ]&lt;/STRONG&gt;

			&lt;STRONG&gt;A&lt;/STRONG&gt; - optimized

			...
			clflush	[ebp-638h]
			...

			&lt;STRONG&gt;B&lt;/STRONG&gt; - non-optimized

			N/A

		&lt;STRONG&gt;[ MinGW C++ compiler ]&lt;/STRONG&gt;

			&lt;STRONG&gt;A&lt;/STRONG&gt; - optimized

			...
			mov	edx, dword ptr [ebp-338h]
			clflush	[edx]
			...

			&lt;STRONG&gt;B&lt;/STRONG&gt; - non-optimized

			N/A

		&lt;STRONG&gt;[ Watcom C++ compiler ]&lt;/STRONG&gt;

			&lt;STRONG&gt;A&lt;/STRONG&gt; - optimized

			...
			mov	eax, dword ptr [ebp-194h]
			clflush	[eax]
			...

			&lt;STRONG&gt;B&lt;/STRONG&gt; - non-optimized

			N/A</description>
      <pubDate>Sat, 24 Sep 2016 00:47:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090184#M65011</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T00:47:55Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090185#M65012</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - No ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ Microsoft C++ compiler ]&lt;/STRONG&gt;

		...
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		...

		&lt;STRONG&gt;Here are generated binary codes&lt;/STRONG&gt;:

		...
		00244486  rdtsc
		00244488  clflush     [ebp-300h]
		0024448F  clflush     [ebp-240h]
		00244496  clflush     [ebp-180h]
		0024449D  mov         dword ptr [ebp-48h], eax
		002444A0  clflush     [ebp-340h]
		002444A7  clflush     [ebp-280h]
		002444AE  clflush     [ebp-1C0h]
		002444B5  clflush     [ebp-100h]
		002444BC  mov         dword ptr [ebp-44h], edx
		002444BF  clflush     [ebp-2C0h]
		002444C6  clflush     [ebp-200h]
		002444CD  clflush     [ebp-140h]
		002444D4  rdtsc
		...</description>
      <pubDate>Sat, 24 Sep 2016 00:51:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090185#M65012</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T00:51:29Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090186#M65013</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - No ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ Borland C++ compiler ]&lt;/STRONG&gt;

		...
		[ CrtClflush ] - Executed in 96 clock cycles
		[ CrtClflush ] - Executed in 91 clock cycles
		[ CrtClflush ] - Executed in 93 clock cycles
		[ CrtClflush ] - Executed in 96 clock cycles
		[ CrtClflush ] - Executed in 96 clock cycles
		[ CrtClflush ] - Executed in 96 clock cycles
		[ CrtClflush ] - Executed in 90 clock cycles
		[ CrtClflush ] - Executed in 91 clock cycles
		[ CrtClflush ] - Executed in 94 clock cycles
		[ CrtClflush ] - Executed in 84 clock cycles
		...

		Here are generated binary codes:

		...
		0040417A  call        CrtRdtsc (406D6Ch)
		0040417F  mov         dword ptr [ebp-230h], eax
		00404185  mov         dword ptr [ebp-22Ch], edx
		0040418B  lea         ecx, [ebp-0BD0h]
		00404191  push        ecx
		00404192  call        CrtClflush (40123Ch)
		00404197  pop         ecx
		00404198  lea         eax, [ebp-0B90h]
		0040419E  push        eax
		0040419F  call        CrtClflush (40123Ch)
		004041A4  pop         ecx
		004041A5  lea         edx, [ebp-0B50h]
		004041AB  push        edx
		004041AC  call        CrtClflush (40123Ch)
		004041B1  pop         ecx
		004041B2  lea         ecx, [ebp-0B10h]
		004041B8  push        ecx
		004041B9  call        CrtClflush (40123Ch)
		004041BE  pop         ecx
		004041BF  lea         eax, [ebp-0AD0h]
		004041C5  push        eax
		004041C6  call        CrtClflush (40123Ch)
		004041CB  pop         ecx
		004041CC  lea         edx, [ebp-0A90h]
		004041D2  push        edx
		004041D3  call        CrtClflush (40123Ch)
		004041D8  pop         ecx
		004041D9  lea         ecx, [ebp-0A50h]
		004041DF  push        ecx
		004041E0  call        CrtClflush (40123Ch)
		004041E5  pop         ecx
		004041E6  lea         eax, [ebp-0A10h]
		004041EC  push        eax
		004041ED  call        CrtClflush (40123Ch)
		004041F2  pop         ecx
		004041F3  lea         edx, [ebp-9D0h]
		004041F9  push        edx
		004041FA  call        CrtClflush (40123Ch)
		004041FF  pop         ecx
		00404200  lea         ecx, [ebp-990h]
		00404206  push        ecx
		00404207  call        CrtClflush (40123Ch)
		0040420C  pop         ecx
		0040420D  call        CrtRdtsc (406D6Ch)
		00404212  mov         dword ptr [ebp-238h], eax
		00404218  mov         dword ptr [ebp-234h], edx
		...

		...
		// &lt;STRONG&gt;CrtRdtsc&lt;/STRONG&gt; (406D6Ch)
		00406D6C  rdtsc
		00406D6E  ret
		...

		...
		// &lt;STRONG&gt;CrtClflush&lt;/STRONG&gt; (40123Ch)
		0040123C  push        ebp 
		0040123D  mov         ebp, esp
		0040123F  mov         eax, dword ptr [ebp+8]
		00401242  clflush     [eax]
		00401245  pop         ebp
		00401246  ret
		...

		&lt;STRONG&gt;Note&lt;/STRONG&gt;: This is the worst case and related to how CLFLUSH and RDTSC instructions are implemented in software.</description>
      <pubDate>Sat, 24 Sep 2016 00:55:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090186#M65013</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T00:55:19Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090187#M65014</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - No ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ Intel C++ compiler ]&lt;/STRONG&gt;

		...
		[ CrtClflush ] - Executed in 20 clock cycles
		[ CrtClflush ] - Executed in 23 clock cycles
		[ CrtClflush ] - Executed in 24 clock cycles
		[ CrtClflush ] - Executed in 24 clock cycles
		[ CrtClflush ] - Executed in 20 clock cycles
		[ CrtClflush ] - Executed in 19 clock cycles
		[ CrtClflush ] - Executed in 19 clock cycles
		[ CrtClflush ] - Executed in 22 clock cycles
		[ CrtClflush ] - Executed in 19 clock cycles
		[ CrtClflush ] - Executed in 18 clock cycles
		...

		A question is &lt;STRONG&gt;why does it slower&lt;/STRONG&gt; than Microsoft or Watcom C++ compilers?

		Here are generated binary codes:

		...
		0040365C  rdtsc
		0040365E  clflush     [ebp-8B8h]
		00403665  mov         ecx, eax
		00403667  clflush     [ebp-878h]
		0040366E  clflush     [ebp-838h]
		00403675  clflush     [ebp-7F8h]
		0040367C  clflush     [ebp-7B8h]
		00403683  clflush     [ebp-778h]
		0040368A  clflush     [ebp-738h]
		00403691  clflush     [ebp-6F8h]
		00403698  clflush     [ebp-6B8h]
		0040369F  clflush     [ebp-678h]
		004036A6  rdtsc
		...

		&lt;STRONG&gt;1&lt;/STRONG&gt;. Intel C++ compiler re-ordered a sequence of instructions.
		&lt;STRONG&gt;2&lt;/STRONG&gt;. 'mov ecx, eax' is placed after the 1st 'clflush [ebp-8B8h]' in order to save a value returned from 'RDTSC' in 'eax' general purpose register.
		&lt;STRONG&gt;3&lt;/STRONG&gt;. It is possible that pipelining is affected ( Very Likely! ), or an instruction stall is created ( Not proven and speculative! ).
		&lt;STRONG&gt;4&lt;/STRONG&gt;. Take a look at a perfectly generated binary codes by Watcom C++ compiler ( see &lt;STRONG&gt;Post #6&lt;/STRONG&gt; ).
		&lt;STRONG&gt;5&lt;/STRONG&gt;. Almost the same re-ordering is done by Microsoft C++ compiler.</description>
      <pubDate>Sat, 24 Sep 2016 00:58:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090187#M65014</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T00:58:00Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090188#M65015</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - No ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ MinGW C++ compiler ]&lt;/STRONG&gt;

		...
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		...

		Here are generated binary codes:

		...
		0040265B  rdtsc            
		0040265D  mov         esi, eax 
		0040265F  clflush     [ebp-2B8h] 
		00402666  clflush     [ebp-278h] 
		0040266D  clflush     [ebp-238h] 
		00402674  clflush     [ebp-1F8h] 
		0040267B  clflush     [ebp-1B8h] 
		00402682  clflush     [ebp-178h] 
		00402689  clflush     [ebp-138h] 
		00402690  clflush     [ebp-0F8h] 
		00402697  clflush     [ebp-0B8h] 
		0040269E  clflush     [ebp-78h] 
		004026A2  rdtsc            
		...

		Perfect binary codes generation.</description>
      <pubDate>Sat, 24 Sep 2016 01:02:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090188#M65015</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T01:02:06Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090189#M65016</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - No ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ Watcom C++ compiler ]&lt;/STRONG&gt;

		...
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		[ CrtClflush ] - Executed in 12 clock cycles
		...

		Here are generated binary codes:

		...
		00403791  rdtsc
		00403793  mov         ecx, eax
		00403795  lea         eax, [ebp-8AEh]
		0040379B  clflush     [eax]
		0040379E  lea         eax, [ebp-86Eh]
		004037A4  clflush     [eax]
		004037A7  lea         eax, [ebp-82Eh]
		004037AD  clflush     [eax]
		004037B0  lea         eax, [ebp-7EEh]
		004037B6  clflush     [eax]
		004037B9  lea         eax, [ebp-7AEh]
		004037BF  clflush     [eax]
		004037C2  lea         eax, [ebp-76Eh]
		004037C8  clflush     [eax]
		004037CB  lea         eax, [ebp-72Eh]
		004037D1  clflush     [eax]
		004037D4  lea         eax, [ebp-6EEh]
		004037DA  clflush     [eax]
		004037DD  lea         eax, [ebp-6AEh]
		004037E3  clflush     [eax]
		004037E6  lea         eax, [ebp-66Eh]
		004037EC  clflush     [eax]
		004037EF  rdtsc
		...

		Perfect binary codes generation.</description>
      <pubDate>Sat, 24 Sep 2016 01:14:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090189#M65016</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T01:14:08Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090190#M65017</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - No - Summary ]&lt;/STRONG&gt;

Let's consider three cases for Intel CPUs 'clflush' instruction:

		&lt;STRONG&gt;1&lt;/STRONG&gt;. Perfect binary codes generation to achieve the highest throughput:

			MinGW C++ compiler	( rating is 10 out of 10 )
			Watcom C++ compiler	( rating is  9 out of 10 )
			Microsoft C++ compiler	( rating is  8 out of 10 )

		&lt;STRONG&gt;2&lt;/STRONG&gt;. Very good binary codes generation to achieve very good throughput:

			Intel C++ compiler	( rating is  5 out of 10 )

		&lt;STRONG&gt;3&lt;/STRONG&gt;. Good binary codes generation but poor throughput ( Not optimized implementation! ):

			Borland C++ compiler	( rating is  3 out of 10 )</description>
      <pubDate>Sat, 24 Sep 2016 01:18:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090190#M65017</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T01:18:30Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090191#M65018</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - Yes ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ Microsoft C++ compiler ]&lt;/STRONG&gt;

		Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
		Tests: Start
		&amp;gt; Test0001 Start &amp;lt;
		**********************************************
		Configuration - WIN32_MSC ( 32-bit ) - Release
		CTestSet::InitTestEnv - Passed

		* CRuntimeSet Start *
		...
		[ CrtSetThreadPriority ] - Executed in 2896 clock cycles
		[ CrtClflush ] - Executed in 84 clock cycles
		[ CrtClflush ] - Executed in 104 clock cycles
		[ CrtClflush ] - Executed in 104 clock cycles
		[ CrtClflush ] - Executed in 116 clock cycles
		[ CrtClflush ] - Executed in 104 clock cycles
		[ CrtClflush ] - Executed in 104 clock cycles
		[ CrtClflush ] - Executed in 92 clock cycles
		[ CrtClflush ] - Executed in 92 clock cycles
		[ CrtClflush ] - Executed in 92 clock cycles
		[ CrtClflush ] - Executed in 198725 clock cycles
		[ CrtSetThreadPriority ] - Executed in 3280 clock cycles
		IrtClflush &amp;amp; CrtClflush
		...
		* CRuntimeSet End *

		Test Completed in 7140 ticks
		&amp;gt; Test0001 End &amp;lt;
		Tests: Completed</description>
      <pubDate>Sat, 24 Sep 2016 01:22:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090191#M65018</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T01:22:44Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090192#M65019</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - Yes ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ Borland C++ compiler ]&lt;/STRONG&gt;

		Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release
		Tests: Start
		&amp;gt; Test0001 Start &amp;lt;
		**********************************************
		Configuration - WIN32_BCC ( 32-bit ) - Release
		CTestSet::InitTestEnv - Passed

		* CRuntimeSet Start *
		...
		[ CrtSetThreadPriority ] - Executed in 28364 clock cycles
		[ CrtClflush ] - Executed in 120 clock cycles
		[ CrtClflush ] - Executed in 368 clock cycles
		[ CrtClflush ] - Executed in 100 clock cycles
		[ CrtClflush ] - Executed in 368 clock cycles
		[ CrtClflush ] - Executed in 100 clock cycles
		[ CrtClflush ] - Executed in 308 clock cycles
		[ CrtClflush ] - Executed in 376 clock cycles
		[ CrtClflush ] - Executed in 372 clock cycles
		[ CrtClflush ] - Executed in 112 clock cycles
		[ CrtClflush ] - Executed in 156735 clock cycles
		[ CrtSetThreadPriority ] - Executed in 11976 clock cycles
		IrtClflush &amp;amp; CrtClflush
		...
		* CRuntimeSet End *

		Test Completed in 9234 ticks
		&amp;gt; Test0001 End &amp;lt;
		Tests: Completed</description>
      <pubDate>Sat, 24 Sep 2016 01:31:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090192#M65019</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T01:31:41Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090193#M65020</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - Yes ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ Intel C++ compiler ]&lt;/STRONG&gt;

		Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
		Tests: Start
		&amp;gt; Test0001 Start &amp;lt;
		**********************************************
		Configuration - WIN32_ICC ( 32-bit ) - Release
		CTestSet::InitTestEnv - Passed

		* CRuntimeSet Start *
		...
		[ CrtSetThreadPriority ] - Executed in 2400 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 221809 clock cycles
		[ CrtSetThreadPriority ] - Executed in 6548 clock cycles
		IrtClflush &amp;amp; CrtClflush
		...
		* CRuntimeSet End *

		Test Completed in 4516 ticks
		&amp;gt; Test0001 End &amp;lt;
		Tests: Completed</description>
      <pubDate>Sat, 24 Sep 2016 01:35:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090193#M65020</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T01:35:58Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090194#M65021</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - Yes ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ MinGW C++ compiler ]&lt;/STRONG&gt;

		Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release
		Tests: Start
		&amp;gt; Test0001 Start &amp;lt;
		**********************************************
		Configuration - WIN32_MGW ( 32-bit ) - Release
		CTestSet::InitTestEnv - Passed

		* CRuntimeSet Start *
		...
		[ CrtSetThreadPriority ] - Executed in 3128 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 171099 clock cycles
		[ CrtSetThreadPriority ] - Executed in 4284 clock cycles
		IrtClflush &amp;amp; CrtClflush
		...
		* CRuntimeSet End *

		Test Completed in 3516 ticks
		&amp;gt; Test0001 End &amp;lt;
		Tests: Completed</description>
      <pubDate>Sat, 24 Sep 2016 01:41:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090194#M65021</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T01:41:58Z</dc:date>
    </item>
    <item>
      <title>[ Run-Time testing - Extended</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090195#M65022</link>
      <description>&lt;STRONG&gt;[ Run-Time testing - Extended Tracing - Yes ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ Watcom C++ compiler ]&lt;/STRONG&gt;

		Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release
		Tests: Start
		&amp;gt; Test0001 Start &amp;lt;
		**********************************************
		Configuration - WIN32_WCC ( 32-bit ) - Release
		CTestSet::InitTestEnv - Passed

		* CRuntimeSet Start *
		...
		[ CrtSetThreadPriority ] - Executed in 3776 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 88 clock cycles
		[ CrtClflush ] - Executed in 173828 clock cycles
		[ CrtSetThreadPriority ] - Executed in 4908 clock cycles
		IrtClflush &amp;amp; CrtClflush
		...
		* CRuntimeSet End *

		Test Completed in 8000 ticks
		&amp;gt; Test0001 End &amp;lt;
		Tests: Completed</description>
      <pubDate>Sat, 24 Sep 2016 01:54:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090195#M65022</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T01:54:02Z</dc:date>
    </item>
    <item>
      <title>[ Flush Cache Win32 API</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090196#M65023</link>
      <description>&lt;STRONG&gt;[ Flush Cache Win32 API functions on Windows Desktop and Embedded OSs ]&lt;/STRONG&gt;
&lt;STRONG&gt;[ Win32 API function - FlushInstructionCache ]&lt;/STRONG&gt;

		&lt;STRONG&gt;Windows Desktop - [ winbase.h ]&lt;/STRONG&gt;

		...
		BOOL WINAPI FlushInstructionCache(
			__in HANDLE hProcess,
			__in_bcount_opt( dwSize ) LPCVOID lpBaseAddress,
			__in SIZE_T dwSize );
		...

		&lt;STRONG&gt;Windows CE - [ winbase.h ]&lt;/STRONG&gt;

		...
		BOOL WINAPI FlushInstructionCache(
				HANDLE hProcess,
				LPCVOID lpBaseAddress,
				DWORD dwSize );
		...</description>
      <pubDate>Sat, 24 Sep 2016 01:58:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090196#M65023</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T01:58:31Z</dc:date>
    </item>
    <item>
      <title>[ Flush Cache intrinsic on</title>
      <link>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090197#M65024</link>
      <description>&lt;STRONG&gt;[ Flush Cache intrinsic on Windows Embedded OSs ]&lt;/STRONG&gt;

		&lt;STRONG&gt;Windows CE - [ cmnintrin.h ]&lt;/STRONG&gt;

		...
		__CacheRelease( void *p );
		...

		When compiling with Microsoft C++ compiler a warning C4732 is displayed when
		an intrinsic '_CacheRelease' is not supported on an architecture.</description>
      <pubDate>Sat, 24 Sep 2016 02:07:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Latency-and-Throughput-of-Intel-CPUs-clflush-instruction/m-p/1090197#M65024</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-09-24T02:07:36Z</dc:date>
    </item>
  </channel>
</rss>

