Software Archive
Read-only legacy content
17061 Discussions

Performance Evaluation of Classic Matrix Multiplication algorithms

SergeyKostrov
Valued Contributor II
6,775 Views
*** Performance Evaluation of Classic Matrix Multiplication algorithms *** [ Abstract ] This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
0 Kudos
1 Solution
zalia64
New Contributor I
6,678 Views

You are. right.

I have missed the one-letter difference in the title.

For simple readers like me, fundamental one-letter differences must be spelled out explicitly.

 

View solution in original post

0 Kudos
146 Replies
SergeyKostrov
Valued Contributor II
959 Views
[ Microsoft C++ compiler ( VS98 PE ) 32-bit ] [ Compiler ] /nologo /Zp16 /MD /W4 /Ox /Ot /Oa /Ow /Og /Oi /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_MBCS" /D "_WIN32_MSC" /D WINVER=0x0400 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /Fp"Release/ScaLibTestApp.pch" /Yu"Stdphf.h" /Fo"Release/" /Fd"Release/" /FD /c [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /stack:0x5000000 /subsystem:console /pdb:none /machine:I386 /out:"Release/ScaLibTestApp.exe"
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Microsoft C++ compiler ( VS2005 PE ) 32-bit ] [ Compiler ] /O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"Release\MscTestApp.pch" /Fo"Release/" /Fd"Release/" /W4 /nologo /c /Wp64 /Zi /Gd /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt /arch:SSE2 [ Linker ] /OUT:"Release/MscTestApp.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Release\MscTestApp.exe.intermediate.manifest" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /LTCG /MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib.lib"
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Microsoft C++ compiler ( VS2008 PE ) 32-bit ] [ Compiler ] /O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"Release\ScaLibTestApp.pch" /Fo"Release/" /Fd"Release/" /W4 /nologo /c /Zi /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt /arch:SSE2 [ Linker ] /OUT:"Release/ScaLibTestApp.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Release\ScaLibTestApp.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /LTCG /DYNAMICBASE:NO /MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib.lib"
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit ] [ Compiler ] /O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"x64\Release\ScaLibTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W4 /nologo /c /Zi /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt [ Linker ] /OUT:"x64\Release/ScaLibTestApp64.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"x64\Release\ScaLibTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /STACK:1073741824 /LTCG /DYNAMICBASE:NO /MACHINE:X64 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib64.lib"
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Microsoft C++ compiler ( VS2008 EE ) 32-bit ] [ Compiler ] /O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"Release\ScaLibTestApp.pch" /Fo"Release/" /Fd"Release/" /W4 /nologo /c /Zi /Gd /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt /arch:SSE2 [ Linker ] /OUT:"Release/ScaLibTestApp.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Release\ScaLibTestApp.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /LTCG /DYNAMICBASE:NO /MACHINE:X86 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib.lib"
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Intel C++ compiler v7.1.0 ( u029 ) 32-bit ] [ Compiler ] /nologo /Zp16 /MD /W4 /GX /O2 /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_UNICODE" /D "_WIN32_ICC" /D WINVER=0x0400 /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /Fp"Release/IccTestApp.pch" /Yu"Stdphf.h" /Fo"Release/" /Fd"Release/" /FD /Qopenmp /Qwd 111,114,161,171,174,175,177,181,193,279,280,304,373,424,444,488,593,673,810,869,981,1011,1418 /c [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib libompstub.lib /nologo /stack:0x5000000 /subsystem:console /pdb:none /machine:I386 /nodefaultlib:"libc.lib" /out:"Release/IccTestApp.exe"
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Intel C++ compiler v8.1.0 ( u038 ) 32-bit ] [ Compiler ] /nologo /Zp16 /MD /W4 /GX /O2 /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_UNICODE" /D "_WIN32_ICC" /D WINVER=0x0400 /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /Fp"Release/IccTestApp.pch" /Yu"Stdphf.h" /Fo"Release/" /Fd"Release/" /FD /Wcheck /Qopenmp /Qwd 111,114,161,171,174,175,177,181,193,279,280,304,373,424,444,488,593,673,810,869,981,1011,1418,1572 /c [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib libompstub.lib /nologo /stack:0x5000000 /subsystem:console /pdb:none /machine:I386 /nodefaultlib:"libc.lib" /out:"Release/IccTestApp.exe"
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit ] [ Compiler ] /c /O3 /Ob1 /Oi /Ot /Oy /Qipo /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE121_300" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"Release\IccTestApp.pch" /Fo"Release/" /W5 /nologo /Wp64 /Zi /Gd /TP /Qdiag-disable:2012 /Qdiag-disable:2013 /Qdiag-disable:2014 /Qdiag-disable:2015 /Qdiag-disable:2017 /Qdiag-disable:2021 /Qdiag-disable:2022 /Qdiag-disable:2304 /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qparallel /Qstd=c++0x /Qrestrict /Qdiag-disable:111,673,10121 /Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 /Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"Release/IccTestApp.exe" /INCREMENTAL:NO /nologo /MANIFEST /MANIFESTFILE:"Release\IccTestApp.exe.intermediate.manifest" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /MACHINE:X86 /qdiag-disable:111,673,10121
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 32-bit ] [ Compiler ] /c /O3 /Ob1 /Oi /Ot /Oy /Qipo /I "..\..\Include" /I "C:\WorkLib\ICC2013\Composer XE 2013\ipp\include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "_IPP_PARALLEL_DYNAMIC" /D "IPP_USE_CUSTOM" /D "INTEL_SUITE_VERSION=PE130_149" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /arch:AVX /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"Release\IccTestApp.pch" /Fo"Release/" /Fd"Release/" /W5 /nologo /Wp64 /Zi /TP /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qstd=c++0x /Qrestrict /Qansi-alias /Qdiag-disable:111,673,2012,2015,2960,10121 /Wport /Qeffc++ /QxAVX /Qansi-alias /Qvec-report=0 /Qfma /Qunroll /Qunroll-aggressive /Qopt-streaming-stores:auto /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 /Qipp /Qipp-link:dynamic /Qmkl [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"Release/IccTestApp.exe" /INCREMENTAL:NO /nologo /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\ipp\lib\ia32" /MANIFEST /MANIFESTFILE:"Release\IccTestApp.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:\WorkEnv\AppsWorkDev\AppsTst\IccTestApp\Release\IccTestApp.lib" /MACHINE:X86 /qdiag-disable:111,673,2012,2015,2960,10121 /qdiag-sc-dir:"My Inspector XE Results - IccTestApp"
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ] [ Compiler ] /c /O3 /Ob1 /Oi /Ot /Qipo /I "..\..\Include" /I "C:\WorkLib\ICC2013\Composer XE 2013\ipp\include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE130_149" /D "_IPP_PARALLEL_DYNAMIC" /D "IPP_USE_CUSTOM" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /arch:AVX /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"x64\Release\IccTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W5 /nologo /Wp64 /Zi /TP /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qstd=c++0x /Qrestrict /Qansi-alias /Qdiag-disable:111,673,2012,2015,2960,10121 /Wport /Qeffc++ /QxAVX /Qansi-alias /Qvec-report=0 /Qfma /Qunroll /Qunroll-aggressive /Qopt-streaming-stores:always /Qipp /Qipp-link:dynamic /Qmkl [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"x64\Release/IccTestApp64.exe" /INCREMENTAL:NO /nologo /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\ipp\lib\intel64" /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\compiler\lib\intel64" /MANIFEST /MANIFESTFILE:"x64\Release\IccTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /NODEFAULTLIB:"../../Bin/Release/ScaLib64.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:1000000000 /LARGEADDRESSAWARE /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /qdiag-disable:111,673,2012,2015,2960,10121 /qdiag-sc-dir:"My Inspector XE Results - IccTestApp"
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Watcom C++ compiler v1.9.0 32-bit ] WccTestApp.cpp -5r -fp5 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -feWccTestApp.exe -k268435456 -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -"libpath C:\WorkLib\ICC2011\Compos~1\Mkl\Lib\Ia32Wcc" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=601 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Watcom C++ compiler v2.0.0 32-bit ] WccTestApp.cpp -5r -fp5 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -feWccTestApp.exe -k268435456 -i"C:\WorkLib\ICC2011\Compos~1\Mkl\Include" -"libpath C:\WorkLib\ICC2011\Compos~1\Mkl\Lib\Ia32Wcc" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=601 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
[ Watcom C++ compiler v2.0.0 64-bit ] WccTestApp.cpp -6r -fp6 -fpi87 -wx -d0 -s -oabil+mprt -xd -D_WIN32_WCC -DNDEBUG -feWccTestApp.exe -k536870912 -i"C:\WorkLib\ICC2013\Compos~1\Mkl\Include" -"libpath C:\WorkLib\ICC2013\Compos~1\Mkl\Lib\Ia32Wcc" -wcd=007 -wcd=008 -wcd=013 -wcd=014 -wcd=086 -wcd=188 -wcd=367 -wcd=368 -wcd=369 -wcd=387 -wcd=389 -wcd=549 -wcd=601 -wcd=628 -wcd=689 -wcd=716 -wcd=725 -wcd=726 -wcd=735
0 Kudos
zalia64
New Contributor I
959 Views

Dear Sergey. 

I think you did a big comprehensive job of comparing different algorithms, compilers and systems. 

I didn't mean to be offensive nor sarcastic. I noted that the 32-bit system was an obsolete P4,  while the 64-bit system was state-of-the-art  AVX CPU.  

The 100 fold speed increase did not surprise me,  Modern CPU with AVX against an obsolete P4?

The surprise was - that according to your tests, the P4 was 4 times quicker, then a modern AVX  CPU  (tests 1.1, 1.2 and others).

This fact surprises me so much, that I fear there was some typo error.

IF TRUE, I fail to understand it. I certainly would like to read a discussion " How and Why the obsolete P4 beats the Haswell by 400% !!"

 

My other point was:

I receive an automatic notification, every time you add a message. Many of us, when confronted with a massive packet of 50 consecutive messages, delete them as a whole. IMHO, it is preferred to squeeze the results into a simple 1-page table.

This squeeze - it is never a simple task to do. But it must be done, so that others would appreciate your work.

 

0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
>>The surprise was - that according to your tests, the P4 was 4 times quicker, then a modern AVX CPU (tests 1.1, 1.2 and others). >> >>This fact surprises me so much, that I fear there was some typo error. >> >>IF TRUE, I fail to understand it. I certainly would like to read a discussion " How and Why the obsolete P4 beats the Haswell by 400% !!" I'm very confused because it is Not clear for me where you are looking at. Please review first 11 posts in that thread because they describe what C++ compilers are tested and on what computer systems, etc. Next, I do Not have a system with Haswel CPU and all my tests on Pentium II, Pentium 4, Atom N270 and Core i7 ( 3rd Gen / Ivy Bridge ) clearly show: - In most cases new versions of C++ compilers are faster because they generate codes with new Intel Instruction Sets - In All cases new Generation CPUs are faster than a previous Generation CPUs. Once again, I'm very confused what you're looking at.
0 Kudos
SergeyKostrov
Valued Contributor II
959 Views
A simple data mining procedure ( could be done manually! ) allows to get a reduced data set. The list of different versions of the algorithm is as follows: MxMultA1 - Classic 2D MxMultA2 - Classic 2D LBOT MxMultA3 - Classic 2D Fused MxMultA4 - Classic 2D Fused LBOT MxMultB1 - Classic 2D Transposed MxMultB2 - Classic 2D Transposed LBOT MxMultB3 - Classic 2D Fused Transposed MxMultB4 - Classic 2D Fused Transposed LBOT MxMultC1 - Classic 2D SSE2 Transposed v1 MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT MatrixMulEx1 - Classic 2D SSE2 Transposed v2 MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT MxMultD1 - Classic 1D MxMultD2 - Classic 1D LBOT
0 Kudos
SergeyKostrov
Valued Contributor II
971 Views
Two sub-versions of each version of the algorithm ( see above ) is evaluated with: - Loop Processing Schema IJK - Pseudo-code: ... for( i = 0; ... ) for( j = 0; ... ) for( k = 0; ... ) ... - Loop Processing Schema IKJ ( aka Loop Interchange technique ) - Pseudo-code: ... for( i = 0; ... ) for( k = 0; ... ) for( j = 0; ... ) ...
0 Kudos
SergeyKostrov
Valued Contributor II
971 Views
In case of MinGW C++ compilers these algorithms are Not implemented: ... MxMultC1 - Classic 2D SSE2 Transposed v1 MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT MatrixMulEx1 - Classic 2D SSE2 Transposed v2 MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT ... It applies to All versions.
0 Kudos
SergeyKostrov
Valued Contributor II
971 Views
+ Another thing is Abbreviations and descriptions are given in a post at the Beginning of the thread. It is very important to understand how a Title of a test Case needs to be read. For example, let's say a Title is: [ MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] This is a Test for: - MinGW C++ compiler v5.1.0 - Release build - 32-bit binary codes - Loop Processing Schema ( LPS ) is IJK - Executed on a computer with Intel Pentium 4 ( P4 ) CPU - The computer has 32-bit Windows XP operating system Once again, it is very important to understand on what system the test was executed.
0 Kudos
SergeyKostrov
Valued Contributor II
971 Views
I see that you wanted to analyze results for MinGW C++ compiler v5.1.0. In that case, this is how it should look like ( after data mining ): [ Analysis of Test Results 1 ] [ MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 3.46800 secs ... [ MinGW C++ compiler v5.1.0 - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.98300 secs ... Note 1: 64-bit codes on Ivy Bridge are ~3.5x faster than 32-bit codes on Pentium 4 Note 2: Classic 2D Transposed version ( MxMultB1 / No LBOT ) is faster than all the rest versions
0 Kudos
SergeyKostrov
Valued Contributor II
971 Views
[ Analysis of Test Results 2 ] [ MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] ... Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.75000 secs ... [ MinGW C++ compiler v5.1.0 - Release - 64-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] ... Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.24900 secs ... Note 1: 64-bit codes on Ivy Bridge are ~11.0x faster than 32-bit codes on Pentium 4 Note 2: Classic 2D version ( MxMultA1 / No LBOT ) is faster than all the rest versions
0 Kudos
Reply