Software Archive
Read-only legacy content
17060 Discussions

Performance Evaluation of Classic Matrix Multiplication algorithms

SergeyKostrov
Valued Contributor II
16,670 Views
*** Performance Evaluation of Classic Matrix Multiplication algorithms *** [ Abstract ] This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
0 Kudos
1 Solution
zalia64
New Contributor I
16,573 Views

You are. right.

I have missed the one-letter difference in the title.

For simple readers like me, fundamental one-letter differences must be spelled out explicitly.

 

View solution in original post

0 Kudos
146 Replies
SergeyKostrov
Valued Contributor II
2,402 Views
[ Watcom C++ compiler v2.0.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 9.71900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 11.00000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 12.04700 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 18.14100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 100.90600 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 102.56200 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 103.92200 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 105.70400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 18.79700 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 21.98500 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ Watcom C++ compiler v2.0.0 64-bit ] Simply to remind again. Even if the compiler and linker are ported to 64-bit platforms generated binary codes are still 32-bit!
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ Watcom C++ compiler v2.0.0 - Release - 32-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 8.92300 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 8.34600 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 9.03300 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 10.12400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.96700 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 3.85300 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.99800 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 3.88400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 8.40900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 8.36200 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ Watcom C++ compiler v2.0.0 - Release - 32-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.34000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 2.76200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 3.13500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 4.57100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 8.68900 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 8.58000 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9.40700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.68800 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.69800 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 3.08900 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ A note about Consistency of Performance Evaluations ] That was really a challenge to be as Consistent as Possible because many things were simply outside of my control. For example, - It was Not possible to get Base Performance numbers for MKL on older platforms, like Windows 95; - Another Inconsistency is related to OpenMP-based multithreading Not available in full on older platforms. It means, that all Completed tests are Single-Threaded using OpenMP. However, it is very easy to scale all numbers by dividing by 2, 4 or 8, in order to get estimated performance numbers in case 2-core, 4-core or 8-core hardware systems. Tests on Windows 8 and Windows 10 operating systems are Not done at all.
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
Here is an example on how data could be used. Let's say a Performance Analysis for Microsoft C++ compiler ( VS98 PE ) needs to be done. A simple data mining procedure allows to get a reduced data set and it looks like: [ Group 1 - LPS - IKJ ] Matrix A, B and C Sizes : 1024 x 1024 Loop Blocking Divider: 1 Loop Processing Schema ( LPS ): IJK [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU AN 32-bit Windows 95 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 140.56801 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 136.45601 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 145.31301 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 142.82801 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 5.08100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 5.31400 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 5.61700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 5.94600 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 136.55101 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 136.57901 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P2 32-bit Windows 2000 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 253.86501 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 253.85501 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 256.85901 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 257.74001 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 48.61000 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 59.95600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 72.07300 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 72.43400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 258.42101 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 258.35201 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 97.57800 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 97.71800 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 97.85900 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 97.89000 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 3.18800 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 4.37500 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 5.45300 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 5.76600 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 97.70400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 97.71800 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU IB 32-bit Windows 7 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 9.64100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 9.06300 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 10.10900 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 9.32900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 0.48400 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 1.21700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 0.67100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 1.18500 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 9.51600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 9.78100 secs
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ Group 2 - LPS - IKJ ] Matrix A, B and C Sizes : 1024 x 1024 Loop Blocking Divider: 1 Loop Processing Schema ( LPS ): IKJ [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU AN 32-bit Windows 95 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 9.87500 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 9.44900 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 9.73700 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 9.75100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 147.64801 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 147.68901 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 146.48101 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 154.74801 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 9.44800 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 9.46300 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P2 32-bit Windows 2000 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 59.51500 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 59.54500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 98.13100 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 98.14100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 254.30601 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 254.62601 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 256.21801 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 255.96901 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 59.69600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 59.68600 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 4.93700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 5.00000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 5.73500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 5.73400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 97.76500 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 97.78100 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 99.50000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 99.34400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 4.96900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 4.98400 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU IB 32-bit Windows 7 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 1.21700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 1.15500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 1.04500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 1.06100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 9.07900 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 9.12600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 9.51600 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 9.53200 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 1.13900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 1.15400 secs
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
Now, take a look at all Sub-Tests 2.1 of the [ Group 1 - LPS - IJK ] ordered in ASC order by a completed time field: [ ... CPU IB ( Ivy Bridge ) 32-bit Windows 7 ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 0.48400 secs ... [ ... CPU P4 ( Pentium 4 ) 32-bit Windows XP ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 3.18800 secs ... [ ... CPU AN ( Atom N270 ) 32-bit Windows 95 ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 5.08100 secs ... [ ... CPU P2 ( Pentium II ) 32-bit Windows 2000 ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 48.61000 secs ... You see how performance numbers are changing...
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ Command Line Options of C++ compilers ] Command Line Options of C++ compilers used in these performance evaluations will be provided.
0 Kudos
zalia64
New Contributor I
2,402 Views

I am sorry,  but the data is too raw to comprehend.

Comparing specific items, the noise seems to overwhelm the results.

Your first test,  MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ]

Sub-Test 1.1 - MxMultA1 - Classic 2D                        2.750 seconds
Sub-Test 1.2 - MxMultA2 - Classic 2D                        2.79700 secs

Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed  - 98.26601 secs

 Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed  - 98.26501 secs

Your second test: MinGW C++ compiler v5.1.0 - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7

Sub-Test 1.1 - MxMultA1 - Classic 2D :                    8.92300 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D :                    8.98600 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed  0.98300 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed  1.15400 secs

64-bit is 100 faster the 32 bit? this should mean a real revolution! But wait, other routines are 4 times slower? 

-----------------------------------------------------------------------------------------------------------------------------------------------------------

 
 
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
>>...64-bit is 100 faster the 32 bit? this should mean a real revolution!... Sir, you shouldn't be too sarcastic because you're comparing: - 32-bit SSE2 codes ( compiled with ICC v12.x ) executed on 32-bit Windows XP with Intel Pentium 4 CPU ( more than 14-year-old technology! ) against - 64-bit AVX codes ( compiled with ICC 13.x ) executed on 64-bit Windows 7 with Intel Core i7 ( Ivy Bridge ) CPU ( more than 3-year old ). Every test case has a Title and look at first a couple of posts for more details.
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
>>[ Command Line Options of C++ compilers ] >> >>Command Line Options of C++ compilers used in these performance evaluations will be provided. Here are they are...
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ Borland C++ compiler v5.5.1 32-bit ] -d -O2 -w -D_WIN32_BCC -DNDEBUG -5 -nRelease -eBccTestApp.exe -I"C:\WorkLib\MKL\Include" -L"C:\WorkLib\MKL\Lib\Ia32Bcc" -lS:33554432 BccTestApp.cpp HrtALLib.asm
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ MinGW C++ compiler v3.4.2 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -ffast-math -fpeel-loops -fomit-frame-pointer -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ MinGW C++ compiler v4.8.1 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -flto -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ MinGW C++ compiler v4.9.2 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -flto -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
2,402 Views
[ MinGW C++ compiler v4.9.2 64-bit ] MgwTestApp.cpp -DNDEBUG -O3 -mavx -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib" -Xlinker --stack=1073741824
0 Kudos
SergeyKostrov
Valued Contributor II
2,422 Views
[ MinGW C++ compiler v5.1.0 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -flto -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
2,422 Views
[ MinGW C++ compiler v5.1.0 64-bit ] MgwTestApp.cpp -DNDEBUG -O3 -mavx -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib" -Xlinker --stack=1073741824
0 Kudos
SergeyKostrov
Valued Contributor II
2,422 Views
[ MinGW C++ compiler v6.1.0 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -flto -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
2,422 Views
[ MinGW C++ compiler v6.1.0 64-bit ] MgwTestApp.cpp -DNDEBUG -O3 -mavx -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib" -Xlinker --stack=1073741824
0 Kudos
Reply