- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** Performance Evaluation of Strassen Matrix Multiplication algorithms ***
Link Copied
161 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Abstract ]
In 1969 the German mathematician Volker Strassen has invented a recursive algorithm for
multiplication of two square matrices with asymptotic complexity O( n^2.807 ). That was the first matrix multiplication algorithm
with asymptotic complexity less than O( n^3 ).
Note: 2.807 = log2( 7 ).
Two Classic well known Compute Schemas ( CS ) of the algorithm are as follows:
Strassen..........................A4B4C4M7T2 ( or ABC4M7T2 )
Strassen-Winograd......A4B4C4S4T4P7U7 ( or ABCST4PU7 )
These algorithms use temporary matrices to store intermediate results and, for example, classic Strassen algorithm
uses two T-matrices ( denoted as T2 ).
Classic versions of Strassen matrix multiplication algorithm could be classified as Recursive Stack Based ( RSB ) and
at every recursion memory for subdivided, or partitioned, temporary matrices is allocated from the stack.
Performance of two Heap Based versions of Strassen matrix multiplication algorithm is evaluated. That is,
Strassen Heap Based Incomplete ( SHBI ) and Strassen Heap Based Complete ( SHBC ) algorithms
with a highly optimized memory usage and Compute Schema A2B2M7 ( or AB2M7 ) without any temporary matrices to store
intermediate results.
These two versions of Strassen matrix multiplication algorithm could be classified as Recursive Heap Based ( RHB ).
Unofficially, on the ScaLib for BDP project, A2B2M7 ( or AB2M7 ) Compute Schema is called as Strassen-Kostrov.
As a reference, performance CBLAS function SGEMM from of Intel MKL, one of the fastest implementations for
matrix multiplication, is evaluated for every test case.
Additional high level technical details about SHBI and SHBC algorithms will be provided.
To learn more about Volker Strassen follow a web-link:
.
http://www.math.uni-konstanz.de/~strassen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ List of Abbreviations ]
MM - Matrix Multiplication
MMA - Matrix Multiplication Algorithm
SMM - Strassen Matrix Multiplication
SMMA - Strassen Matrix Multiplication Algorithm
CS - Compute Schema
RSB - Recursive Stack Based
RHB - Recursive Heap Based
HBI - Heap Based Incomplete
HBC - Heap Based Complete
SCSB - Strassen Classic Stack Based
WCSB - Winograd Classic Stack Based
SHBI - Strassen Heap Based Incomplete
SHBC - Strassen Heap Based Complete
DRP - Degree of Recursive Processing
Compute Schemas:
A4B4C4M7T2 ( or ABC4M7T2 ) SCS
A4B4C4S4T4P7U7 ( or ABCST4PU7 ) SWCS
A2B2M7 ( or AB2M7 ) SKCS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Performance Evaluations of SHBI and SHBC versions of SMMA are completed for these matrix ( A, B and C ) sizes ]
4096 x 4096
8192 x 8192
16384 x 16384
32768 x 32768
40960 x 40960
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Computer Systems used for performance evaluations ]
** Dell Precision Mobile M4700 **
Intel Core i7-3840QM ( 2.80 GHz )
Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846
32GB RAM
320GB HDD
NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory )
Windows 7 Professional 64-bit SP1
Size of L3 Cache = 8MB ( shared between all cores for data & instructions )
Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions )
Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions )
Display resolution: 1366 x 768
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ C++ compilers used for performance evaluations ]
MinGW C++ compiler v5.1.0 64-bit
MinGW C++ compiler v6.1.0 64-bit
Microsoft C++ compiler ( VS2008 PE ) 64-bit
Intel C++ compiler v13.1.0 ( u149 ) 64-bit
Command Line Options of C++ compilers used in these performance evaluations will be provided.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v5.1.0 64-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-mavx
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib"
-Xlinker
--stack=1073741824
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v6.1.0 64-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-mavx
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib"
-Xlinker
--stack=1073741824
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit ]
[ Compiler ]
O2 /Ob1 /Oi /Ot /Oy /GL /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_MSC" /D "_UNICODE" /D "UNICODE" /GF /Gm /MT /GS- /fp:fast /GR- /openmp /Yu"Stdphf.h" /Fp"x64 Release\ScaLibTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W4 /nologo /c /Zi /TP /wd4005 /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_ICC" /U "_WIN32_WCC" /errorReport:prompt
[ Linker ]
/OUT:"x64\Release/ScaLibTestApp64.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"x64\Release\ScaLibTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /STACK:1073741824 /LTCG /DYNAMICBASE:NO /MACHINE:X64 /ERRORREPORT:PROMPT kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib "..\..\bin\release\scalib64.lib"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ]
[ Compiler ]
/c /O3 /Ob1 /Oi /Ot /Qipo /I "..\..\Include" /I "C:\WorkLib\ICC2013\Composer XE 2013\ipp\include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE130_149" /D "_IPP_PARALLEL_DYNAMIC" /D "IPP_USE_CUSTOM" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /arch:AVX /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"x64\Release\IccTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W5 /nologo /Wp64 /Zi /TP /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qstd=c++0x /Qrestrict /Qansi-alias /Qdiag-disable:111,673,2012,2015,2960,10121 /Wport /Qeffc++ /QxAVX /Qansi-alias /Qvec-report=0 /Qfma /Qunroll /Qunroll-aggressive /Qopt-streaming-stores:always /Qipp /Qipp-link:dynamic /Qmkl
[ Linker ]
kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"x64\Release/IccTestApp64.exe" /INCREMENTAL:NO /nologo /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\ipp\lib\intel64" /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\compiler\lib\intel64" /MANIFEST /MANIFESTFILE:"x64\Release\IccTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /NODEFAULTLIB:"../../Bin/Release/ScaLib64.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:1000000000 /LARGEADDRESSAWARE /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /qdiag-disable:111,673,2012,2015,2960,10121 /qdiag-sc-dir:"My Inspector XE Results - IccTestApp"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// [ 4096x4096 / OpenMP Threads=4 / Release ]
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ *** Microsoft C++ compiler ( VS2008 PE ) 64-bit *** ]
[ SHBI 4096x4096 DRP=1st ]
Strassen HBI
Matrix Size : 4096 x 4096
Matrix Size Threshold : 2048 x 2048
Matrix Partitions : 8
Degree of Recursion : 1
Result Sets Reflection: N/A
Calculating...
Strassen HBI - Pass 01 - Completed: 0.90500 secs
Strassen HBI - Pass 02 - Completed: 0.87300 secs
Strassen HBI - Pass 03 - Completed: 0.87400 secs
Strassen HBI - Pass 04 - Completed: 0.87300 secs
Strassen HBI - Pass 05 - Completed: 0.87400 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ *** Microsoft C++ compiler ( VS2008 PE ) 64-bit *** ]
[ SHBC 4096x4096 DRP=1st ]
Strassen HBC
Matrix Size : 4096 x 4096
Matrix Size Threshold : 2048 x 2048
Matrix Partitions : 8
Degree of Recursion : 1
Result Sets Reflection: Disabled
Calculating...
Strassen HBC - Pass 01 - Completed: 0.88900 secs
Strassen HBC - Pass 02 - Completed: 0.87400 secs
Strassen HBC - Pass 03 - Completed: 0.87300 secs
Strassen HBC - Pass 04 - Completed: 0.87400 secs
Strassen HBC - Pass 05 - Completed: 0.87400 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ *** Microsoft C++ compiler ( VS2008 PE ) 64-bit *** ]
[ SHBC 4096x4096 DRP=2nd ]
Strassen HBC
Matrix Size : 4096 x 4096
Matrix Size Threshold : 1024 x 1024
Matrix Partitions : 57
Degree of Recursion : 2
Result Sets Reflection: Enabled
Calculating...
Strassen HBC - Pass 01 - Completed: 1.02900 secs
Strassen HBC - Pass 02 - Completed: 0.96800 secs
Strassen HBC - Pass 03 - Completed: 0.98200 secs
Strassen HBC - Pass 04 - Completed: 0.96800 secs
Strassen HBC - Pass 05 - Completed: 0.98200 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ *** Microsoft C++ compiler ( VS2008 PE ) 64-bit *** ]
[ SHBC 4096x4096 DRP=3rd ]
Strassen HBC
Matrix Size : 4096 x 4096
Matrix Size Threshold : 512 x 512
Matrix Partitions : 400
Degree of Recursion : 3
Result Sets Reflection: Enabled
Calculating...
Strassen HBC - Pass 01 - Completed: 1.23300 secs
Strassen HBC - Pass 02 - Completed: 1.13900 secs
Strassen HBC - Pass 03 - Completed: 1.13800 secs
Strassen HBC - Pass 04 - Completed: 1.15500 secs
Strassen HBC - Pass 05 - Completed: 1.13900 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ *** Microsoft C++ compiler ( VS2008 PE ) 64-bit *** ]
[ SHBC 4096x4096 DRP=4th ]
Strassen HBC
Matrix Size : 4096 x 4096
Matrix Size Threshold : 256 x 256
Matrix Partitions : 2801
Degree of Recursion : 4
Result Sets Reflection: Enabled
Calculating...
Strassen HBC - Pass 01 - Completed: 1.79400 secs
Strassen HBC - Pass 02 - Completed: 1.60700 secs
Strassen HBC - Pass 03 - Completed: 1.60700 secs
Strassen HBC - Pass 04 - Completed: 1.60600 secs
Strassen HBC - Pass 05 - Completed: 1.60700 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ *** Microsoft C++ compiler ( VS2008 PE ) 64-bit *** ]
[ SHBC 4096x4096 DRP=5th ]
Strassen HBC
Matrix Size : 4096 x 4096
Matrix Size Threshold : 128 x 128
Matrix Partitions : 19608
Degree of Recursion : 5
Result Sets Reflection: Enabled
Calculating...
Strassen HBC - Pass 01 - Completed: 3.91600 secs
Strassen HBC - Pass 02 - Completed: 3.41600 secs
Strassen HBC - Pass 03 - Completed: 3.40100 secs
Strassen HBC - Pass 04 - Completed: 3.41600 secs
Strassen HBC - Pass 05 - Completed: 3.41700 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ *** Microsoft C++ compiler ( VS2008 PE ) 64-bit *** ]
[ MKL 4096x4096 ]
Cblas SGEMM
Matrix Size : 4096 x 4096
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.85800 secs
Cblas SGEMM - Pass 02 - Completed: 0.85800 secs
Cblas SGEMM - Pass 03 - Completed: 0.85800 secs
Cblas SGEMM - Pass 04 - Completed: 0.85800 secs
Cblas SGEMM - Pass 05 - Completed: 0.85800 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// [ 4096x4096 / OpenMP Threads=4 / Release ]
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ *** MinGW C++ compiler v5.1.0 64-bit *** ]
[ SHBI 4096x4096 DRP=1st ]
Strassen HBI
Matrix Size : 4096 x 4096
Matrix Size Threshold : 2048 x 2048
Matrix Partitions : 8
Degree of Recursion : 1
Result Sets Reflection: N/A
Calculating...
Strassen HBI - Pass 01 - Completed: 0.90500 secs
Strassen HBI - Pass 02 - Completed: 0.87400 secs
Strassen HBI - Pass 03 - Completed: 0.87400 secs
Strassen HBI - Pass 04 - Completed: 0.87300 secs
Strassen HBI - Pass 05 - Completed: 0.87400 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ *** MinGW C++ compiler v5.1.0 64-bit *** ]
[ SHBC 4096x4096 DRP=1st ]
Strassen HBC
Matrix Size : 4096 x 4096
Matrix Size Threshold : 2048 x 2048
Matrix Partitions : 8
Degree of Recursion : 1
Result Sets Reflection: Disabled
Calculating...
Strassen HBC - Pass 01 - Completed: 0.88900 secs
Strassen HBC - Pass 02 - Completed: 0.87300 secs
Strassen HBC - Pass 03 - Completed: 0.87400 secs
Strassen HBC - Pass 04 - Completed: 0.87400 secs
Strassen HBC - Pass 05 - Completed: 0.87300 secs

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page