- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** Performance Evaluation of Classic Matrix Multiplication algorithms ***
[ Abstract ]
This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are. right.
I have missed the one-letter difference in the title.
For simple readers like me, fundamental one-letter differences must be spelled out explicitly.
Link Copied
146 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler v2.0.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ]
Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 9.71900 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 11.00000 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 12.04700 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 18.14100 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 100.90600 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 102.56200 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 103.92200 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 105.70400 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 18.79700 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 21.98500 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler v2.0.0 64-bit ]
Simply to remind again. Even if the compiler and linker are ported to 64-bit platforms generated binary codes are still 32-bit!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler v2.0.0 - Release - 32-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ]
Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IJK
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 8.92300 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 8.34600 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 9.03300 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 10.12400 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.96700 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 3.85300 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.99800 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 3.88400 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 8.40900 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 8.36200 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler v2.0.0 - Release - 32-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ]
Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 2.34000 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 2.76200 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 3.13500 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 4.57100 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 8.68900 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 8.58000 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 9.40700 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 9.68800 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 2.69800 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 3.08900 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ A note about Consistency of Performance Evaluations ]
That was really a challenge to be as Consistent as Possible because many things were simply outside of my control.
For example,
- It was Not possible to get Base Performance numbers for MKL on older platforms, like Windows 95;
- Another Inconsistency is related to OpenMP-based multithreading Not available in full on older platforms.
It means, that all Completed tests are Single-Threaded using OpenMP. However, it is very easy to
scale all numbers by dividing by 2, 4 or 8, in order to get estimated performance numbers in case 2-core, 4-core or
8-core hardware systems.
Tests on Windows 8 and Windows 10 operating systems are Not done at all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is an example on how data could be used.
Let's say a Performance Analysis for Microsoft C++ compiler ( VS98 PE ) needs to be done. A simple data mining procedure allows to get a reduced data set and it looks like:
[ Group 1 - LPS - IKJ ]
Matrix A, B and C Sizes : 1024 x 1024
Loop Blocking Divider: 1
Loop Processing Schema ( LPS ): IJK
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU AN 32-bit Windows 95 ]
Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 140.56801 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 136.45601 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 145.31301 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 142.82801 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 5.08100 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 5.31400 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 5.61700 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 5.94600 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 136.55101 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 136.57901 secs
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P2 32-bit Windows 2000 ]
Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 253.86501 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 253.85501 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 256.85901 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 257.74001 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 48.61000 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 59.95600 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 72.07300 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 72.43400 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 258.42101 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 258.35201 secs
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ]
Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 97.57800 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 97.71800 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 97.85900 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 97.89000 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 3.18800 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 4.37500 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 5.45300 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 5.76600 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 97.70400 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 97.71800 secs
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU IB 32-bit Windows 7 ]
Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 9.64100 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 9.06300 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 10.10900 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 9.32900 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 0.48400 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 1.21700 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 0.67100 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 1.18500 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 9.51600 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 9.78100 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Group 2 - LPS - IKJ ]
Matrix A, B and C Sizes : 1024 x 1024
Loop Blocking Divider: 1
Loop Processing Schema ( LPS ): IKJ
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU AN 32-bit Windows 95 ]
Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 9.87500 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 9.44900 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 9.73700 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 9.75100 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 147.64801 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 147.68901 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 146.48101 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 154.74801 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 9.44800 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 9.46300 secs
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P2 32-bit Windows 2000 ]
Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 59.51500 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 59.54500 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 98.13100 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 98.14100 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 254.30601 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 254.62601 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 256.21801 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 255.96901 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 59.69600 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 59.68600 secs
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ]
Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 4.93700 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 5.00000 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 5.73500 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 5.73400 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 97.76500 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 97.78100 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 99.50000 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 99.34400 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 4.96900 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 4.98400 secs
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU IB 32-bit Windows 7 ]
Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 1.21700 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 1.15500 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 1.04500 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 1.06100 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 9.07900 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 9.12600 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 9.51600 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 9.53200 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 1.13900 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 1.15400 secs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Now, take a look at all Sub-Tests 2.1 of the [ Group 1 - LPS - IJK ] ordered in ASC order by a completed time field:
[ ... CPU IB ( Ivy Bridge ) 32-bit Windows 7 ]
...
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 0.48400 secs
...
[ ... CPU P4 ( Pentium 4 ) 32-bit Windows XP ]
...
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 3.18800 secs
...
[ ... CPU AN ( Atom N270 ) 32-bit Windows 95 ]
...
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 5.08100 secs
...
[ ... CPU P2 ( Pentium II ) 32-bit Windows 2000 ]
...
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 48.61000 secs
...
You see how performance numbers are changing...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Command Line Options of C++ compilers ]
Command Line Options of C++ compilers used in these performance evaluations will be provided.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am sorry, but the data is too raw to comprehend.
Comparing specific items, the noise seems to overwhelm the results.
Your first test, MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...64-bit is 100 faster the 32 bit? this should mean a real revolution!...
Sir, you shouldn't be too sarcastic because you're comparing:
- 32-bit SSE2 codes ( compiled with ICC v12.x ) executed on 32-bit Windows XP with Intel Pentium 4 CPU ( more than 14-year-old technology! )
against
- 64-bit AVX codes ( compiled with ICC 13.x ) executed on 64-bit Windows 7 with Intel Core i7 ( Ivy Bridge ) CPU ( more than 3-year old ).
Every test case has a Title and look at first a couple of posts for more details.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>[ Command Line Options of C++ compilers ]
>>
>>Command Line Options of C++ compilers used in these performance evaluations will be provided.
Here are they are...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Borland C++ compiler v5.5.1 32-bit ]
-d -O2 -w -D_WIN32_BCC -DNDEBUG -5 -nRelease -eBccTestApp.exe -I"C:\WorkLib\MKL\Include" -L"C:\WorkLib\MKL\Lib\Ia32Bcc" -lS:33554432 BccTestApp.cpp HrtALLib.asm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v3.4.2 32-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-msse2
-ffast-math
-fpeel-loops
-fomit-frame-pointer
-w
-I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib"
-Xlinker
--stack=67108864
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v4.8.1 32-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-msse2
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-flto
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib"
-Xlinker
--stack=67108864
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v4.9.2 32-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-msse2
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-flto
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib"
-Xlinker
--stack=67108864
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v4.9.2 64-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-mavx
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib"
-Xlinker
--stack=1073741824
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v5.1.0 32-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-msse2
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-flto
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib"
-Xlinker
--stack=67108864
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v5.1.0 64-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-mavx
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib"
-Xlinker
--stack=1073741824
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v6.1.0 32-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-msse2
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-flto
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib"
-Xlinker
--stack=67108864
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ MinGW C++ compiler v6.1.0 64-bit ]
MgwTestApp.cpp
-DNDEBUG
-O3
-mavx
-mprfchw
-ffast-math
-fpeel-loops
-ftree-vectorizer-verbose=0
-ftree-vectorize
-fvect-cost-model
-fomit-frame-pointer
-fwhole-program
-fopenmp
-w
-I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include"
-B "../../AppsSca"
"C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib"
-Xlinker
--stack=1073741824

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Sub-Test 1.1 - MxMultA1 - Classic 2D 2.750 seconds
Sub-Test 1.2 - MxMultA2 - Classic 2D 2.79700 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - 98.26601 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed - 98.26501 secs
Your second test: MinGW C++ compiler v5.1.0 - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7
Sub-Test 1.1 - MxMultA1 - Classic 2D : 8.92300 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D : 8.98600 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed 0.98300 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed 1.15400 secs
64-bit is 100 faster the 32 bit? this should mean a real revolution! But wait, other routines are 4 times slower?
-----------------------------------------------------------------------------------------------------------------------------------------------------------