Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

Performance Evaluation of Classic Matrix Multiplication algorithms

SergeyKostrov
Valued Contributor II
15,210 Views
*** Performance Evaluation of Classic Matrix Multiplication algorithms *** [ Abstract ] This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
0 Kudos
1 Solution
zalia64
New Contributor I
15,113 Views

You are. right.

I have missed the one-letter difference in the title.

For simple readers like me, fundamental one-letter differences must be spelled out explicitly.

 

View solution in original post

0 Kudos
146 Replies
SergeyKostrov
Valued Contributor II
2,079 Views
[ Microsoft C++ compiler ( VS98 PE ) 32-bit ] [ Compiler ] /nologo /Zp16 /MD /W4 /Ox /Ot /Oa /Ow /Og /Oi /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_MBCS" /D "_WIN32_MSC" /D WINVER=0x0400 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /Fp"Release/ScaLibTestApp.pch" /Yu"Stdphf.h" /Fo"Release/" /Fd"Release/" /FD /c [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /stack:0x5000000 /subsystem:console /pdb:none /machine:I386 /out:"Release/ScaLibTestApp.exe"
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Microsoft C++ compiler ( VS2005 PE ) 32-bit ] [ Compiler ] /O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"Release\MscTestApp.pch" /Fo"Release/" /Fd"Release/" /W4 /nologo /c /Wp64 /Zi /Gd /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt /arch:SSE2 [ Linker ] /OUT:"Release/MscTestApp.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Release\MscTestApp.exe.intermediate.manifest" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /LTCG /MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib.lib"
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Microsoft C++ compiler ( VS2008 PE ) 32-bit ] [ Compiler ] /O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"Release\ScaLibTestApp.pch" /Fo"Release/" /Fd"Release/" /W4 /nologo /c /Zi /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt /arch:SSE2 [ Linker ] /OUT:"Release/ScaLibTestApp.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Release\ScaLibTestApp.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /LTCG /DYNAMICBASE:NO /MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib.lib"
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit ] [ Compiler ] /O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"x64\Release\ScaLibTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W4 /nologo /c /Zi /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt [ Linker ] /OUT:"x64\Release/ScaLibTestApp64.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"x64\Release\ScaLibTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /STACK:1073741824 /LTCG /DYNAMICBASE:NO /MACHINE:X64 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib64.lib"
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Microsoft C++ compiler ( VS2008 EE ) 32-bit ] [ Compiler ] /O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"Release\ScaLibTestApp.pch" /Fo"Release/" /Fd"Release/" /W4 /nologo /c /Zi /Gd /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt /arch:SSE2 [ Linker ] /OUT:"Release/ScaLibTestApp.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Release\ScaLibTestApp.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /LTCG /DYNAMICBASE:NO /MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib.lib"
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Intel C++ compiler v7.1.0 ( u029 ) 32-bit ] [ Compiler ] /nologo /Zp16 /MD /W4 /GX /O2 /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_UNICODE" /D "_WIN32_ICC" /D WINVER=0x0400 /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /Fp"Release/IccTestApp.pch" /Yu"Stdphf.h" /Fo"Release/" /Fd"Release/" /FD /Qopenmp /Qwd 111,114,161,171,174,175,177,181,193,279,280,304,373,424,444,488,593,673,810,869,981,1011,1418 /c [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib libompstub.lib /nologo /stack:0x5000000 /subsystem:console /pdb:none /machine:I386 /nodefaultlib:"libc.lib" /out:"Release/IccTestApp.exe"
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Intel C++ compiler v8.1.0 ( u038 ) 32-bit ] [ Compiler ] /nologo /Zp16 /MD /W4 /GX /O2 /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_UNICODE" /D "_WIN32_ICC" /D WINVER=0x0400 /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /Fp"Release/IccTestApp.pch" /Yu"Stdphf.h" /Fo"Release/" /Fd"Release/" /FD /Wcheck /Qopenmp /Qwd 111,114,161,171,174,175,177,181,193,279,280,304,373,424,444,488,593,673,810,869,981,1011,1418,1572 /c [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib libompstub.lib /nologo /stack:0x5000000 /subsystem:console /pdb:none /machine:I386 /nodefaultlib:"libc.lib" /out:"Release/IccTestApp.exe"
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit ] [ Compiler ] /c /O3 /Ob1 /Oi /Ot /Oy /Qipo /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE121_300" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"Release\IccTestApp.pch" /Fo"Release/" /W5 /nologo /Wp64 /Zi /Gd /TP /Qdiag-disable:2012 /Qdiag-disable:2013 /Qdiag-disable:2014 /Qdiag-disable:2015 /Qdiag-disable:2017 /Qdiag-disable:2021 /Qdiag-disable:2022 /Qdiag-disable:2304 /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qparallel /Qstd=c++0x /Qrestrict /Qdiag-disable:111,673,10121 /Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 /Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"Release/IccTestApp.exe" /INCREMENTAL:NO /nologo /MANIFEST /MANIFESTFILE:"Release\IccTestApp.exe.intermediate.manifest" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /MACHINE:X86 /qdiag-disable:111,673,10121
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 32-bit ] [ Compiler ] /c /O3 /Ob1 /Oi /Ot /Oy /Qipo /I "..\..\Include" /I "C:\WorkLib\ICC2013\Composer XE 2013\ipp\include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "_IPP_PARALLEL_DYNAMIC" /D "IPP_USE_CUSTOM" /D "INTEL_SUITE_VERSION=PE130_149" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /arch:AVX /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"Release\IccTestApp.pch" /Fo"Release/" /Fd"Release/" /W5 /nologo /Wp64 /Zi /TP /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qstd=c++0x /Qrestrict /Qansi-alias /Qdiag-disable:111,673,2012,2015,2960,10121 /Wport /Qeffc++ /QxAVX /Qansi-alias /Qvec-report=0 /Qfma /Qunroll /Qunroll-aggressive /Qopt-streaming-stores:auto /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 /Qipp /Qipp-link:dynamic /Qmkl [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"Release/IccTestApp.exe" /INCREMENTAL:NO /nologo /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\ipp\lib\ia32" /MANIFEST /MANIFESTFILE:"Release\IccTestApp.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:\WorkEnv\AppsWorkDev\AppsTst\IccTestApp\Release\IccTestApp.lib" /MACHINE:X86 /qdiag-disable:111,673,2012,2015,2960,10121 /qdiag-sc-dir:"My Inspector XE Results - IccTestApp"
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ] [ Compiler ] /c /O3 /Ob1 /Oi /Ot /Qipo /I "..\..\Include" /I "C:\WorkLib\ICC2013\Composer XE 2013\ipp\include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE130_149" /D "_IPP_PARALLEL_DYNAMIC" /D "IPP_USE_CUSTOM" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /arch:AVX /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"x64\Release\IccTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W5 /nologo /Wp64 /Zi /TP /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qstd=c++0x /Qrestrict /Qansi-alias /Qdiag-disable:111,673,2012,2015,2960,10121 /Wport /Qeffc++ /QxAVX /Qansi-alias /Qvec-report=0 /Qfma /Qunroll /Qunroll-aggressive /Qopt-streaming-stores:always /Qipp /Qipp-link:dynamic /Qmkl [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"x64\Release/IccTestApp64.exe" /INCREMENTAL:NO /nologo /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\ipp\lib\intel64" /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\compiler\lib\intel64" /MANIFEST /MANIFESTFILE:"x64\Release\IccTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /NODEFAULTLIB:"../../Bin/Release/ScaLib64.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:1000000000 /LARGEADDRESSAWARE /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /qdiag-disable:111,673,2012,2015,2960,10121 /qdiag-sc-dir:"My Inspector XE Results - IccTestApp"
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Watcom C++ compiler v1.9.0 32-bit ] WccTestApp.cpp -5r -fp5 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -feWccTestApp.exe -k268435456 -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -"libpath C:\WorkLib\ICC2011\Compos~1\Mkl\Lib\Ia32Wcc" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=601 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Watcom C++ compiler v2.0.0 32-bit ] WccTestApp.cpp -5r -fp5 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -feWccTestApp.exe -k268435456 -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -"libpath C:\WorkLib\ICC2011\Compos~1\Mkl\Lib\Ia32Wcc" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=601 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
[ Watcom C++ compiler v2.0.0 64-bit ] WccTestApp.cpp -6r -fp6 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -feWccTestApp.exe -k536870912 -i"C:\WorkLib\ICC2013\Compos~1\Mkl\Include" -"libpath C:\WorkLib\ICC2013\Compos~1\Mkl\Lib\Ia32Wcc" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=601 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
0 Kudos
zalia64
New Contributor I
2,079 Views

Dear Sergey. 

I think you did a big comprehensive job of comparing different algorithms, compilers and systems. 

I didn't mean to be offensive nor sarcastic. I noted that the 32-bit system was an obsolete P4,  while the 64-bit system was state-of-the-art  AVX CPU.  

The 100 fold speed increase did not surprise me,  Modern CPU with AVX against an obsolete P4?

The surprise was - that according to your tests, the P4 was 4 times quicker, then a modern AVX  CPU  (tests 1.1, 1.2 and others).

This fact surprises me so much, that I fear there was some typo error.

IF TRUE, I fail to understand it. I certainly would like to read a discussion " How and Why the obsolete P4 beats the Haswell by 400% !!"

 

My other point was:

I receive an automatic notification, every time you add a message. Many of us, when confronted with a massive packet of 50 consecutive messages, delete them as a whole. IMHO, it is preferred to squeeze the results into a simple 1-page table.

This squeeze - it is never a simple task to do. But it must be done, so that others would appreciate your work.

 

0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
>>The surprise was - that according to your tests, the P4 was 4 times quicker, then a modern AVX CPU (tests 1.1, 1.2 and others). >> >>This fact surprises me so much, that I fear there was some typo error. >> >>IF TRUE, I fail to understand it. I certainly would like to read a discussion " How and Why the obsolete P4 beats the Haswell by 400% !!" I'm very confused because it is Not clear for me where you are looking at. Please review first 11 posts in that thread because they describe what C++ compilers are tested and on what computer systems, etc. Next, I do Not have a system with Haswel CPU and all my tests on Pentium II, Pentium 4, Atom N270 and Core i7 ( 3rd Gen / Ivy Bridge ) clearly show: - In most cases new versions of C++ compilers are faster because they generate codes with new Intel Instruction Sets - In All cases new Generation CPUs are faster than a previous Generation CPUs. Once again, I'm very confused what you're looking at.
0 Kudos
SergeyKostrov
Valued Contributor II
2,079 Views
A simple data mining procedure ( could be done manually! ) allows to get a reduced data set. The list of different versions of the algorithm is as follows: MxMultA1 - Classic 2D MxMultA2 - Classic 2D LBOT MxMultA3 - Classic 2D Fused MxMultA4 - Classic 2D Fused LBOT MxMultB1 - Classic 2D Transposed MxMultB2 - Classic 2D Transposed LBOT MxMultB3 - Classic 2D Fused Transposed MxMultB4 - Classic 2D Fused Transposed LBOT MxMultC1 - Classic 2D SSE2 Transposed v1 MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT MatrixMulEx1 - Classic 2D SSE2 Transposed v2 MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT MxMultD1 - Classic 1D MxMultD2 - Classic 1D LBOT
0 Kudos
SergeyKostrov
Valued Contributor II
2,091 Views
Two sub-versions of each version of the algorithm ( see above ) is evaluated with: - Loop Processing Schema IJK - Pseudo-code: ... for( i = 0; ... ) for( j = 0; ... ) for( k = 0; ... ) ... - Loop Processing Schema IKJ ( aka Loop Interchange technique ) - Pseudo-code: ... for( i = 0; ... ) for( k = 0; ... ) for( j = 0; ... ) ...
0 Kudos
SergeyKostrov
Valued Contributor II
2,091 Views
In case of MinGW C++ compilers these algorithms are Not implemented: ... MxMultC1 - Classic 2D SSE2 Transposed v1 MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT MatrixMulEx1 - Classic 2D SSE2 Transposed v2 MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT ... It applies to All versions.
0 Kudos
SergeyKostrov
Valued Contributor II
2,091 Views
+ Another thing is Abbreviations and descriptions are given in a post at the Beginning of the thread. It is very important to understand how a Title of a test Case needs to be read. For example, let's say a Title is: [ MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] This is a Test for: - MinGW C++ compiler v5.1.0 - Release build - 32-bit binary codes - Loop Processing Schema ( LPS ) is IJK - Executed on a computer with Intel Pentium 4 ( P4 ) CPU - The computer has 32-bit Windows XP operating system Once again, it is very important to understand on what system the test was executed.
0 Kudos
SergeyKostrov
Valued Contributor II
2,091 Views
I see that you wanted to analyze results for MinGW C++ compiler v5.1.0. In that case, this is how it should look like ( after data mining ): [ Analysis of Test Results 1 ] [ MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 3.46800 secs ... [ MinGW C++ compiler v5.1.0 - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.98300 secs ... Note 1: 64-bit codes on Ivy Bridge are ~3.5x faster than 32-bit codes on Pentium 4 Note 2: Classic 2D Transposed version ( MxMultB1 / No LBOT ) is faster than all the rest versions
0 Kudos
SergeyKostrov
Valued Contributor II
2,091 Views
[ Analysis of Test Results 2 ] [ MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] ... Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.75000 secs ... [ MinGW C++ compiler v5.1.0 - Release - 64-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] ... Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.24900 secs ... Note 1: 64-bit codes on Ivy Bridge are ~11.0x faster than 32-bit codes on Pentium 4 Note 2: Classic 2D version ( MxMultA1 / No LBOT ) is faster than all the rest versions
0 Kudos
Reply