Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

Performance Evaluation of Classic Matrix Multiplication algorithms

SergeyKostrov
Valued Contributor II
15,007 Views
*** Performance Evaluation of Classic Matrix Multiplication algorithms *** [ Abstract ] This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
0 Kudos
1 Solution
zalia64
New Contributor I
14,910 Views

You are. right.

I have missed the one-letter difference in the title.

For simple readers like me, fundamental one-letter differences must be spelled out explicitly.

 

View solution in original post

0 Kudos
146 Replies
SergeyKostrov
Valued Contributor II
2,218 Views
[ Watcom C++ compiler v2.0.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 9.71900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 11.00000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 12.04700 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 18.14100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 100.90600 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 102.56200 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 103.92200 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 105.70400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 18.79700 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 21.98500 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ Watcom C++ compiler v2.0.0 64-bit ] Simply to remind again. Even if the compiler and linker are ported to 64-bit platforms generated binary codes are still 32-bit!
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ Watcom C++ compiler v2.0.0 - Release - 32-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 8.92300 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 8.34600 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 9.03300 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 10.12400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.96700 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 3.85300 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.99800 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 3.88400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 8.40900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 8.36200 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ Watcom C++ compiler v2.0.0 - Release - 32-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.34000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 2.76200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 3.13500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 4.57100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 8.68900 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 8.58000 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9.40700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.68800 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.69800 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 3.08900 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ A note about Consistency of Performance Evaluations ] That was really a challenge to be as Consistent as Possible because many things were simply outside of my control. For example, - It was Not possible to get Base Performance numbers for MKL on older platforms, like Windows 95; - Another Inconsistency is related to OpenMP-based multithreading Not available in full on older platforms. It means, that all Completed tests are Single-Threaded using OpenMP. However, it is very easy to scale all numbers by dividing by 2, 4 or 8, in order to get estimated performance numbers in case 2-core, 4-core or 8-core hardware systems. Tests on Windows 8 and Windows 10 operating systems are Not done at all.
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
Here is an example on how data could be used. Let's say a Performance Analysis for Microsoft C++ compiler ( VS98 PE ) needs to be done. A simple data mining procedure allows to get a reduced data set and it looks like: [ Group 1 - LPS - IKJ ] Matrix A, B and C Sizes : 1024 x 1024 Loop Blocking Divider: 1 Loop Processing Schema ( LPS ): IJK [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU AN 32-bit Windows 95 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 140.56801 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 136.45601 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 145.31301 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 142.82801 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 5.08100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 5.31400 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 5.61700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 5.94600 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 136.55101 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 136.57901 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P2 32-bit Windows 2000 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 253.86501 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 253.85501 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 256.85901 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 257.74001 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 48.61000 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 59.95600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 72.07300 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 72.43400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 258.42101 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 258.35201 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 97.57800 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 97.71800 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 97.85900 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 97.89000 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 3.18800 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 4.37500 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 5.45300 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 5.76600 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 97.70400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 97.71800 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU IB 32-bit Windows 7 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 9.64100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 9.06300 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 10.10900 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 9.32900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 0.48400 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 1.21700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 0.67100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 1.18500 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 9.51600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 9.78100 secs
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ Group 2 - LPS - IKJ ] Matrix A, B and C Sizes : 1024 x 1024 Loop Blocking Divider: 1 Loop Processing Schema ( LPS ): IKJ [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU AN 32-bit Windows 95 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 9.87500 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 9.44900 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 9.73700 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 9.75100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 147.64801 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 147.68901 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 146.48101 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 154.74801 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 9.44800 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 9.46300 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P2 32-bit Windows 2000 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 59.51500 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 59.54500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 98.13100 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 98.14100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 254.30601 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 254.62601 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 256.21801 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 255.96901 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 59.69600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 59.68600 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 4.93700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 5.00000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 5.73500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 5.73400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 97.76500 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 97.78100 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 99.50000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 99.34400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 4.96900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 4.98400 secs [ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU IB 32-bit Windows 7 ] Sub-Test 1.1 - MxMultA1 - Classic 2D - LBOT size: N/A - Completed: 1.21700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT - LBOT size: 1024x1024 elements - Completed: 1.15500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused - LBOT size: N/A - Completed: 1.04500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT - LBOT size: 1024x1024 elements - Completed: 1.06100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 9.07900 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 9.12600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed - LBOT size: N/A - Completed: 9.51600 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT - LBOT size: 1024x1024 elements - Completed: 9.53200 secs Sub-Test 5.1 - MxMultD1 - Classic 1D - LBOT size: N/A - Completed: 1.13900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT - LBOT size: 1024x1024 elements - Completed: 1.15400 secs
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
Now, take a look at all Sub-Tests 2.1 of the [ Group 1 - LPS - IJK ] ordered in ASC order by a completed time field: [ ... CPU IB ( Ivy Bridge ) 32-bit Windows 7 ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 0.48400 secs ... [ ... CPU P4 ( Pentium 4 ) 32-bit Windows XP ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 3.18800 secs ... [ ... CPU AN ( Atom N270 ) 32-bit Windows 95 ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 5.08100 secs ... [ ... CPU P2 ( Pentium II ) 32-bit Windows 2000 ] ... Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 48.61000 secs ... You see how performance numbers are changing...
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ Command Line Options of C++ compilers ] Command Line Options of C++ compilers used in these performance evaluations will be provided.
0 Kudos
zalia64
New Contributor I
2,218 Views

I am sorry,  but the data is too raw to comprehend.

Comparing specific items, the noise seems to overwhelm the results.

Your first test,  MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ]

Sub-Test 1.1 - MxMultA1 - Classic 2D                        2.750 seconds
Sub-Test 1.2 - MxMultA2 - Classic 2D                        2.79700 secs

Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed  - 98.26601 secs

 Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed  - 98.26501 secs

Your second test: MinGW C++ compiler v5.1.0 - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7

Sub-Test 1.1 - MxMultA1 - Classic 2D :                    8.92300 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D :                    8.98600 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed  0.98300 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed  1.15400 secs

64-bit is 100 faster the 32 bit? this should mean a real revolution! But wait, other routines are 4 times slower? 

-----------------------------------------------------------------------------------------------------------------------------------------------------------

 
 
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
>>...64-bit is 100 faster the 32 bit? this should mean a real revolution!... Sir, you shouldn't be too sarcastic because you're comparing: - 32-bit SSE2 codes ( compiled with ICC v12.x ) executed on 32-bit Windows XP with Intel Pentium 4 CPU ( more than 14-year-old technology! ) against - 64-bit AVX codes ( compiled with ICC 13.x ) executed on 64-bit Windows 7 with Intel Core i7 ( Ivy Bridge ) CPU ( more than 3-year old ). Every test case has a Title and look at first a couple of posts for more details.
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
>>[ Command Line Options of C++ compilers ] >> >>Command Line Options of C++ compilers used in these performance evaluations will be provided. Here are they are...
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ Borland C++ compiler v5.5.1 32-bit ] -d -O2 -w -D_WIN32_BCC -DNDEBUG -5 -nRelease -eBccTestApp.exe -I"C:\WorkLib\MKL\Include" -L"C:\WorkLib\MKL\Lib\Ia32Bcc" -lS:33554432 BccTestApp.cpp HrtALLib.asm
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ MinGW C++ compiler v3.4.2 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -ffast-math -fpeel-loops -fomit-frame-pointer -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ MinGW C++ compiler v4.8.1 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -flto -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ MinGW C++ compiler v4.9.2 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -flto -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
2,218 Views
[ MinGW C++ compiler v4.9.2 64-bit ] MgwTestApp.cpp -DNDEBUG -O3 -mavx -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib" -Xlinker --stack=1073741824
0 Kudos
SergeyKostrov
Valued Contributor II
2,238 Views
[ MinGW C++ compiler v5.1.0 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -flto -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
2,238 Views
[ MinGW C++ compiler v5.1.0 64-bit ] MgwTestApp.cpp -DNDEBUG -O3 -mavx -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib" -Xlinker --stack=1073741824
0 Kudos
SergeyKostrov
Valued Contributor II
2,238 Views
[ MinGW C++ compiler v6.1.0 32-bit ] MgwTestApp.cpp -DNDEBUG -O3 -msse2 -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -flto -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2011/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2011/Composer XE/Mkl/Lib/Ia32/mkl_rt.lib" -Xlinker --stack=67108864
0 Kudos
SergeyKostrov
Valued Contributor II
2,238 Views
[ MinGW C++ compiler v6.1.0 64-bit ] MgwTestApp.cpp -DNDEBUG -O3 -mavx -mprfchw -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -fwhole-program -fopenmp -w -I "C:/WorkLib/ICC2013/Composer XE/Mkl/Include" -B "../../AppsSca" "C:/WorkLib/ICC2013/Composer XE/Mkl/Lib/Intel64/mkl_rt.lib" -Xlinker --stack=1073741824
0 Kudos
Reply