<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic [ Summary of Performance in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062982#M54914</link>
    <description>&lt;STRONG&gt;[ Summary of Performance evaluation 128-bit Streaming store codes - 1 ]&lt;/STRONG&gt;

&lt;STRONG&gt;1.&lt;/STRONG&gt; Codes generated by MinGW C++ compiler with 128-bit Streaming stores were faster by &lt;STRONG&gt;7.3%&lt;/STRONG&gt; than codes generated by Microsoft C++ compiler.

&lt;STRONG&gt;2.&lt;/STRONG&gt; Codes generated by MinGW C++ compiler with 128-bit Streaming stores were faster by &lt;STRONG&gt;16.5%&lt;/STRONG&gt; than codes generated by Intel C++ compiler.

&lt;STRONG&gt;3.&lt;/STRONG&gt; Without 128-bit Streaming Stores

...
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;23.625&lt;/STRONG&gt; secs
...

...
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;26.216&lt;/STRONG&gt; secs
...

...
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;21.735&lt;/STRONG&gt; secs
...

&lt;STRONG&gt;4.&lt;/STRONG&gt; With 128-bit Streaming Stores

...
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;23.203&lt;/STRONG&gt; secs
...

...
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;25.766&lt;/STRONG&gt; secs
...

...
Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;21.516&lt;/STRONG&gt; secs
...</description>
    <pubDate>Sun, 07 Feb 2016 08:38:14 GMT</pubDate>
    <dc:creator>SergeyKostrov</dc:creator>
    <dc:date>2016-02-07T08:38:14Z</dc:date>
    <item>
      <title>Analysis of 128-bit Streaming store codes vs. Non Streaming store codes</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062964#M54896</link>
      <description>&lt;STRONG&gt;*** Analysis of 128-bit Streaming store codes vs. Non Streaming store codes ***&lt;/STRONG&gt;</description>
      <pubDate>Sun, 07 Feb 2016 02:59:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062964#M54896</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T02:59:54Z</dc:date>
    </item>
    <item>
      <title>[ Abstract ]</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062965#M54897</link>
      <description>&lt;STRONG&gt;[ Abstract ]&lt;/STRONG&gt;

I recently completed an analysis of some C codes to initalize a large 3-D data set with dimensions &lt;STRONG&gt;8192 x 4 x 8192&lt;/STRONG&gt; ( X-Y-Z ). In overall, the data set has 268,435,456 Single Precision Floating Point data type elements.

Since in &lt;STRONG&gt;Y&lt;/STRONG&gt; direction there are only 4 elements a 128-bit Streaming store Intel intrinsic &lt;STRONG&gt;_mm_stream_ps&lt;/STRONG&gt; function was used ( Test-case 2 ) instead of primitive assignments ( Test-case 1 ) in an &lt;STRONG&gt;Unrolled For-Loop&lt;/STRONG&gt; with &lt;STRONG&gt;4-in-1&lt;/STRONG&gt; schema.

Three C++ compilers were used and their versions are as follows:

Microsoft C++ compiler: 14.00.50727.762 ( default in VS 2005 )
Intel C++ compiler: 12.1.7.371
MinGW C++ compiler: 4.9.0

I would rate all of them as &lt;STRONG&gt;legacy&lt;/STRONG&gt; C++ compilers since they were released about 5 to 10 years ago.

Take into account that a main purpose of the analysis was investigation if Streaming stores are making initialization of the data set faster regardless of C++ compiler used.</description>
      <pubDate>Sun, 07 Feb 2016 06:21:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062965#M54897</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T06:21:17Z</dc:date>
    </item>
    <item>
      <title>[ Test-case 1 ]</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062966#M54898</link>
      <description>&lt;STRONG&gt;[ Test-case 1 ]&lt;/STRONG&gt;
[ C Source codes of Test-Case - without 128-bit Streaming Stores ]

...
RTssize_t i;
for( i = 0; i &amp;lt; m_iSize4; i += 4 )
{
	m_ptData1D[i  ] = ( T )rtValue;
	m_ptData1D[i+1] = ( T )rtValue;
	m_ptData1D[i+2] = ( T )rtValue;
	m_ptData1D[i+3] = ( T )rtValue;
}
...

&lt;STRONG&gt;[ Test-case 2 ]&lt;/STRONG&gt;
[ C Source codes of Test-Case - with 128-bit Streaming Stores ]

...
RTssize_t i;
for( i = 0; i &amp;lt; m_iSize4; i += 4 )
{
	CrtStreamPs128( ( RTfloat * )&amp;amp;m_ptData1D[i  ], rtValue );
	CrtStreamPs128( ( RTfloat * )&amp;amp;m_ptData1D[i+1], rtValue );
	CrtStreamPs128( ( RTfloat * )&amp;amp;m_ptData1D[i+2], rtValue );
	CrtStreamPs128( ( RTfloat * )&amp;amp;m_ptData1D[i+3], rtValue );
}
...

&lt;STRONG&gt;Note 1:&lt;/STRONG&gt; &lt;STRONG&gt;rtValue&lt;/STRONG&gt; is declared as a variable of &lt;STRONG&gt;__m128&lt;/STRONG&gt; type, that is, it has 4 members of type float ( Single Precision Floating Point ).

&lt;STRONG&gt;Note 2:&lt;/STRONG&gt; &lt;STRONG&gt;CrtStreamPs128&lt;/STRONG&gt; function is a &lt;STRONG&gt;portable wrapper&lt;/STRONG&gt; around Intel &lt;STRONG&gt;_mm_stream_ps&lt;/STRONG&gt; intrinsic function.</description>
      <pubDate>Sun, 07 Feb 2016 06:25:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062966#M54898</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T06:25:45Z</dc:date>
    </item>
    <item>
      <title>[ MinGW C++ compiler perfect</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062967#M54899</link>
      <description>&lt;STRONG&gt;[ MinGW C++ compiler - Generated almost perfect assembler codes ]&lt;/STRONG&gt;

I also looked at assembler codes generated by these C++ compilers and I was very impressed how MinGW C++ compiler generated almost perfect codes. It used the same schema for both cases, without Streaming stores and with Streaming, and they differ only in what assignment instruction was used:

- In case of codes without Streaming stores &lt;STRONG&gt;movaps&lt;/STRONG&gt; instruction was used

			...
			00403520  movaps      xmmword ptr [eax], xmm5
			00403523  add         eax, 40h
			00403526  movaps      xmmword ptr [eax-30h], xmm5
			0040352A  movaps      xmmword ptr [eax-20h], xmm5
			0040352E  movaps      xmmword ptr [eax-10h], xmm5
			00403532  cmp         eax, ecx
			00403534  jne         _ZN8CDataSet7RunTestEv+2D0h (403520h)
			...

- In case of codes with Streaming stores &lt;STRONG&gt;movntps&lt;/STRONG&gt; instruction was used

			...
			00403520  movntps     xmmword ptr [eax], xmm5
			00403523  add         eax, 40h
			00403526  movntps     xmmword ptr [eax-30h], xmm5
			0040352A  movntps     xmmword ptr [eax-20h], xmm5
			0040352E  movntps     xmmword ptr [eax-10h], xmm5
			00403532  cmp         eax, ecx
			00403534  jne         _ZN8CDataSet7RunTestEv+2D0h (403520h)
			...

As you can see assembler codes for the main processing of a C &lt;STRONG&gt;For-Loop&lt;/STRONG&gt; are identical!</description>
      <pubDate>Sun, 07 Feb 2016 06:31:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062967#M54899</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T06:31:00Z</dc:date>
    </item>
    <item>
      <title>[ Test-case 1 - without 128</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062968#M54900</link>
      <description>&lt;STRONG&gt;[ Test-case 1 - without 128-bit Streaming Stores ]&lt;/STRONG&gt;
[ C Source codes of Test-Case - without 128-bit Streaming Stores ]

...
RTssize_t i;
for( i = 0; i &amp;lt; m_iSize4; i += 4 )
{
	m_ptData1D[i  ] = ( T )rtValue;
	m_ptData1D[i+1] = ( T )rtValue;
	m_ptData1D[i+2] = ( T )rtValue;
	m_ptData1D[i+3] = ( T )rtValue;
}
...

&lt;STRONG&gt;Note 1:&lt;/STRONG&gt; &lt;STRONG&gt;rtValue&lt;/STRONG&gt; is declared as a variable of &lt;STRONG&gt;__m128&lt;/STRONG&gt; type, that is, it has 4 members of type &lt;STRONG&gt;float&lt;/STRONG&gt; ( Single Precision Floating Point ).</description>
      <pubDate>Sun, 07 Feb 2016 06:45:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062968#M54900</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T06:45:01Z</dc:date>
    </item>
    <item>
      <title>[ Microsoft C++ compiler -</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062969#M54901</link>
      <description>&lt;STRONG&gt;[ Microsoft C++ compiler - without 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Tests: Start
&amp;gt; Test0001 Start &amp;lt;
*****************************************************************************
Configuration - WIN32_MSC ( 32-bit ) - Release
CTestSet::InitTestEnv - Passed
* CDataSet Start *
&amp;gt; TDataSet Methods &amp;lt;
DataSet::&amp;lt; RTm128 &amp;gt; - Passed
&amp;gt; CDataSet Methods &amp;lt;
&amp;gt; CDataSet Algorithms &amp;lt;
* CDataSet End *
Test Completed in &lt;STRONG&gt;23.625&lt;/STRONG&gt; secs
&amp;gt; Test0001 End &amp;lt;
Tests: Completed
...</description>
      <pubDate>Sun, 07 Feb 2016 06:48:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062969#M54901</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T06:48:57Z</dc:date>
    </item>
    <item>
      <title>[ Intel C++ compiler -</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062970#M54902</link>
      <description>&lt;STRONG&gt;[ Intel C++ compiler - without 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
&amp;gt; Test0001 Start &amp;lt;
*****************************************************************************
Configuration - WIN32_ICC ( 32-bit ) - Release
CTestSet::InitTestEnv - Passed
* CDataSet Start *
&amp;gt; TDataSet Methods &amp;lt;
DataSet::&amp;lt; RTm128 &amp;gt; - Passed
&amp;gt; CDataSet Methods &amp;lt;
&amp;gt; CDataSet Algorithms &amp;lt;
* CDataSet End *
Test Completed in &lt;STRONG&gt;26.216&lt;/STRONG&gt; secs
&amp;gt; Test0001 End &amp;lt;
Tests: Completed
...</description>
      <pubDate>Sun, 07 Feb 2016 06:50:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062970#M54902</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T06:50:53Z</dc:date>
    </item>
    <item>
      <title>[ MinGW C++ compiler -</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062971#M54903</link>
      <description>&lt;STRONG&gt;[ MinGW C++ compiler - without 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release
Tests: Start
&amp;gt; Test0001 Start &amp;lt;
*****************************************************************************
Configuration - WIN32_MGW ( 32-bit ) - Release
CTestSet::InitTestEnv - Passed
* CDataSet Start *
&amp;gt; TDataSet Methods &amp;lt;
DataSet::&amp;lt; RTm128 &amp;gt; - Passed
&amp;gt; CDataSet Methods &amp;lt;
&amp;gt; CDataSet Algorithms &amp;lt;
* CDataSet End *
Test Completed in &lt;STRONG&gt;21.735&lt;/STRONG&gt; secs
&amp;gt; Test0001 End &amp;lt;
Tests: Completed
...</description>
      <pubDate>Sun, 07 Feb 2016 06:54:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062971#M54903</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T06:54:00Z</dc:date>
    </item>
    <item>
      <title>[ Microsoft C++ compiler</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062972#M54904</link>
      <description>&lt;STRONG&gt;[ Microsoft C++ compiler assembler codes - without 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
00243690  mov         edx, dword ptr [esi+80h]
00243696  movaps      xmmword ptr [edx+eax], xmm0
0024369A  mov         edx, dword ptr [esi+80h]
002436A0  movaps      xmmword ptr [eax+edx+10h], xmm0
002436A5  mov         edx, dword ptr [esi+80h]
002436AB  movaps      xmmword ptr [eax+edx+20h], xmm0
002436B0  mov         edx, dword ptr [esi+80h]
002436B6  movaps      xmmword ptr [edx+eax+30h], xmm0
002436BB  add         ecx, 4
002436BE  add         eax, 40h
002436C1  cmp         ecx, dword ptr [esi+0D0h]
002436C7  jl          CDataSet::RunTest+310h (243690h)
...</description>
      <pubDate>Sun, 07 Feb 2016 07:02:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062972#M54904</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T07:02:09Z</dc:date>
    </item>
    <item>
      <title>[ Intel C++ compiler</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062973#M54905</link>
      <description>&lt;STRONG&gt;[ Intel C++ compiler assembler codes - without 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
0040143D  movaps      xmm0, xmmword ptr [ebp-358h]
00401444  inc         edx
00401445  movaps      xmmword ptr [ecx+esi], xmm0
00401449  movaps      xmmword ptr [ecx+esi+10h], xmm0
0040144E  movaps      xmmword ptr [ecx+esi+20h], xmm0
00401453  movaps      xmmword ptr [ecx+esi+30h], xmm0
00401458  add         ecx, 40h
0040145B  cmp         edx, eax
0040145D  jb          CDataSet::RunTest+28Dh (40143Dh)
...</description>
      <pubDate>Sun, 07 Feb 2016 07:04:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062973#M54905</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T07:04:12Z</dc:date>
    </item>
    <item>
      <title>[ MinGW C++ compiler</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062974#M54906</link>
      <description>&lt;STRONG&gt;[ MinGW C++ compiler assembler codes - without 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
00403520  movaps      xmmword ptr [eax], xmm5
00403523  add         eax, 40h
00403526  movaps      xmmword ptr [eax-30h], xmm5
0040352A  movaps      xmmword ptr [eax-20h], xmm5
0040352E  movaps      xmmword ptr [eax-10h], xmm5
00403532  cmp         eax, ecx
00403534  jne         _ZN8CDataSet7RunTestEv+2D0h (403520h)
...</description>
      <pubDate>Sun, 07 Feb 2016 07:06:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062974#M54906</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T07:06:25Z</dc:date>
    </item>
    <item>
      <title>[ Test-case 2 - with 128-bit</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062975#M54907</link>
      <description>&lt;STRONG&gt;[ Test-case 2 - with 128-bit Streaming Stores ]&lt;/STRONG&gt;
[ C Source codes of Test-Case - with 128-bit Streaming Stores ]

...
RTssize_t i;
for( i = 0; i &amp;lt; m_iSize4; i += 4 )
{
	CrtStreamPs128( ( RTfloat * )&amp;amp;m_ptData1D[i  ], rtValue );
	CrtStreamPs128( ( RTfloat * )&amp;amp;m_ptData1D[i+1], rtValue );
	CrtStreamPs128( ( RTfloat * )&amp;amp;m_ptData1D[i+2], rtValue );
	CrtStreamPs128( ( RTfloat * )&amp;amp;m_ptData1D[i+3], rtValue );
}
...

&lt;STRONG&gt;Note 1:&lt;/STRONG&gt; &lt;STRONG&gt;rtValue&lt;/STRONG&gt; is declared as a variable of &lt;STRONG&gt;__m128&lt;/STRONG&gt; type, that is, it has 4 members of type &lt;STRONG&gt;float&lt;/STRONG&gt; ( Single Precision Floating Point ).

&lt;STRONG&gt;Note 2:&lt;/STRONG&gt; &lt;STRONG&gt;CrtStreamPs128&lt;/STRONG&gt; function is a &lt;STRONG&gt;portable wrapper&lt;/STRONG&gt; around Intel &lt;STRONG&gt;_mm_stream_ps&lt;/STRONG&gt; intrinsic function.</description>
      <pubDate>Sun, 07 Feb 2016 07:11:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062975#M54907</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T07:11:46Z</dc:date>
    </item>
    <item>
      <title>[ Microsoft C++ compiler -</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062976#M54908</link>
      <description>&lt;STRONG&gt;[ Microsoft C++ compiler - with 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Tests: Start
&amp;gt; Test0001 Start &amp;lt;
*****************************************************************************
Configuration - WIN32_MSC ( 32-bit ) - Release
CTestSet::InitTestEnv - Passed
* CDataSet Start *
&amp;gt; TDataSet Methods &amp;lt;
DataSet::&amp;lt; RTm128 &amp;gt; - Passed
&amp;gt; CDataSet Methods &amp;lt;
&amp;gt; CDataSet Algorithms &amp;lt;
* CDataSet End *
Test Completed in &lt;STRONG&gt;23.203&lt;/STRONG&gt; secs
&amp;gt; Test0001 End &amp;lt;
Tests: Completed
...</description>
      <pubDate>Sun, 07 Feb 2016 07:42:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062976#M54908</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T07:42:06Z</dc:date>
    </item>
    <item>
      <title>[ Intel C++ compiler - with</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062977#M54909</link>
      <description>&lt;STRONG&gt;[ Intel C++ compiler - with 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
&amp;gt; Test0001 Start &amp;lt;
*****************************************************************************
Configuration - WIN32_ICC ( 32-bit ) - Release
CTestSet::InitTestEnv - Passed
* CDataSet Start *
&amp;gt; TDataSet Methods &amp;lt;
DataSet::&amp;lt; RTm128 &amp;gt; - Passed
&amp;gt; CDataSet Methods &amp;lt;
&amp;gt; CDataSet Algorithms &amp;lt;
* CDataSet End *
Test Completed in &lt;STRONG&gt;25.766&lt;/STRONG&gt; secs
&amp;gt; Test0001 End &amp;lt;
Tests: Completed
...</description>
      <pubDate>Sun, 07 Feb 2016 07:42:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062977#M54909</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T07:42:58Z</dc:date>
    </item>
    <item>
      <title>[ Intel C++ compiler - with</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062978#M54910</link>
      <description>&lt;STRONG&gt;[ MinGW C++ compiler - with 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release
Tests: Start
&amp;gt; Test0001 Start &amp;lt;
*****************************************************************************
Configuration - WIN32_MGW ( 32-bit ) - Release
CTestSet::InitTestEnv - Passed
* CDataSet Start *
&amp;gt; TDataSet Methods &amp;lt;
DataSet::&amp;lt; RTm128 &amp;gt; - Passed
&amp;gt; CDataSet Methods &amp;lt;
&amp;gt; CDataSet Algorithms &amp;lt;
* CDataSet End *
Test Completed in &lt;STRONG&gt;21.516&lt;/STRONG&gt; secs
&amp;gt; Test0001 End &amp;lt;
Tests: Completed
...</description>
      <pubDate>Sun, 07 Feb 2016 07:43:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062978#M54910</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T07:43:00Z</dc:date>
    </item>
    <item>
      <title>[ Intel C++ compiler - with</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062979#M54911</link>
      <description>&lt;STRONG&gt;[ Microsoft C++ compiler assembler codes - with 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
00243690  mov         ecx, dword ptr [esi+80h]
00243696  movntps     xmmword ptr [ecx+eax], xmm0
0024369A  add         ecx, eax
0024369C  mov         ecx, dword ptr [esi+80h]
002436A2  movntps     xmmword ptr [eax+ecx+10h], xmm0
002436A7  mov         ebx, dword ptr [esi+80h]
002436AD  lea         ecx, [eax+30h]
002436B0  movntps     xmmword ptr [ecx+ebx-10h], xmm0
002436B5  mov         ebx, dword ptr [esi+80h]
002436BB  add         ebx, ecx
002436BD  add         edx, 4
002436C0  movntps     xmmword ptr [ebx], xmm0
002436C3  add         eax, 40h
002436C6  cmp         edx, dword ptr [esi+0D0h]
002436CC  jl          CDataSet::RunTest+310h (243690h)
...</description>
      <pubDate>Sun, 07 Feb 2016 07:44:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062979#M54911</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T07:44:00Z</dc:date>
    </item>
    <item>
      <title>[ Intel C++ compiler</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062980#M54912</link>
      <description>&lt;STRONG&gt;[ Intel C++ compiler assembler codes - with 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
00401BBD  mov         ecx, edx
00401BBF  add         edx, 4
00401BC2  shl         ecx, 4
00401BC5  movaps      xmm0, xmmword ptr [ebp-358h]
00401BCC  cmp         edx, eax
00401BCE  movntps     xmmword ptr [ecx+esi], xmm0
00401BD2  movntps     xmmword ptr [ecx+esi+10h], xmm0
00401BD7  movntps     xmmword ptr [ecx+esi+20h], xmm0
00401BDC  movntps     xmmword ptr [ecx+esi+30h], xmm0
00401BE1  jl          CDataSet::RunTest+26Dh (401BBDh)
...</description>
      <pubDate>Sun, 07 Feb 2016 08:23:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062980#M54912</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T08:23:19Z</dc:date>
    </item>
    <item>
      <title>[ MinGW C++ compiler</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062981#M54913</link>
      <description>&lt;STRONG&gt;[ MinGW C++ compiler assembler codes - with 128-bit Streaming Stores ]&lt;/STRONG&gt;

...
00403520  movntps     xmmword ptr [eax], xmm5
00403523  add         eax, 40h
00403526  movntps     xmmword ptr [eax-30h], xmm5
0040352A  movntps     xmmword ptr [eax-20h], xmm5
0040352E  movntps     xmmword ptr [eax-10h], xmm5
00403532  cmp         eax, ecx
00403534  jne         _ZN8CDataSet7RunTestEv+2D0h (403520h)
...

Note: By the way, all C++ compilers use interleave technique ( some call it as alternating operations ) when generating binary codes to get the best from CPU pipelining.</description>
      <pubDate>Sun, 07 Feb 2016 08:31:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062981#M54913</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T08:31:56Z</dc:date>
    </item>
    <item>
      <title>[ Summary of Performance</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062982#M54914</link>
      <description>&lt;STRONG&gt;[ Summary of Performance evaluation 128-bit Streaming store codes - 1 ]&lt;/STRONG&gt;

&lt;STRONG&gt;1.&lt;/STRONG&gt; Codes generated by MinGW C++ compiler with 128-bit Streaming stores were faster by &lt;STRONG&gt;7.3%&lt;/STRONG&gt; than codes generated by Microsoft C++ compiler.

&lt;STRONG&gt;2.&lt;/STRONG&gt; Codes generated by MinGW C++ compiler with 128-bit Streaming stores were faster by &lt;STRONG&gt;16.5%&lt;/STRONG&gt; than codes generated by Intel C++ compiler.

&lt;STRONG&gt;3.&lt;/STRONG&gt; Without 128-bit Streaming Stores

...
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;23.625&lt;/STRONG&gt; secs
...

...
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;26.216&lt;/STRONG&gt; secs
...

...
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;21.735&lt;/STRONG&gt; secs
...

&lt;STRONG&gt;4.&lt;/STRONG&gt; With 128-bit Streaming Stores

...
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;23.203&lt;/STRONG&gt; secs
...

...
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;25.766&lt;/STRONG&gt; secs
...

...
Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release
Test Completed in &lt;STRONG&gt;21.516&lt;/STRONG&gt; secs
...</description>
      <pubDate>Sun, 07 Feb 2016 08:38:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062982#M54914</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T08:38:14Z</dc:date>
    </item>
    <item>
      <title>[ Summary of Performance</title>
      <link>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062983#M54915</link>
      <description>&lt;STRONG&gt;[ Summary of Performance evaluation 128-bit Streaming store codes - 2 ]&lt;/STRONG&gt;

Or in another form:

&lt;STRONG&gt;Microsoft C++ compiler:&lt;/STRONG&gt; &lt;STRONG&gt;23.625&lt;/STRONG&gt; secs ( without Streaming store ) vs. &lt;STRONG&gt;23.203&lt;/STRONG&gt; secs ( with Streaming store )
Summary: With Streaming store initialization of the data set is &lt;STRONG&gt;~1.8%&lt;/STRONG&gt; faster.

&lt;STRONG&gt;Intel C++ compiler:&lt;/STRONG&gt; &lt;STRONG&gt;26.216&lt;/STRONG&gt; secs ( without Streaming store ) vs. &lt;STRONG&gt;25.766&lt;/STRONG&gt; secs ( with Streaming store )
Summary: With Streaming store initialization of the data set is &lt;STRONG&gt;~1.7%&lt;/STRONG&gt; faster.

&lt;STRONG&gt;MinGW C++ compiler:&lt;/STRONG&gt; &lt;STRONG&gt;21.735&lt;/STRONG&gt; secs ( without Streaming store ) vs. &lt;STRONG&gt;21.516&lt;/STRONG&gt; secs ( with Streaming store )
Summary: With Streaming store initialization of the data set is &lt;STRONG&gt;~1.0%&lt;/STRONG&gt; faster.</description>
      <pubDate>Sun, 07 Feb 2016 08:44:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Analysis-of-128-bit-Streaming-store-codes-vs-Non-Streaming-store/m-p/1062983#M54915</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2016-02-07T08:44:20Z</dc:date>
    </item>
  </channel>
</rss>

