Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

Performance Evaluation of Classic Matrix Multiplication algorithms

SergeyKostrov
Valued Contributor II
15,069 Views
*** Performance Evaluation of Classic Matrix Multiplication algorithms *** [ Abstract ] This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
0 Kudos
1 Solution
zalia64
New Contributor I
14,972 Views

You are. right.

I have missed the one-letter difference in the title.

For simple readers like me, fundamental one-letter differences must be spelled out explicitly.

 

View solution in original post

0 Kudos
146 Replies
SergeyKostrov
Valued Contributor II
1,805 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 0.26500 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 0.25000 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 0.26500 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 0.26500 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 0.25000 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 0.45200 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 0.45200 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 0.45300 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 0.45200 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 0.46800 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 0.01600 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 0.01500 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 0.26600 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 0.26500 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 0.24900 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 0.26600 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 0.24900 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 0.46800 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 0.45300 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 0.45200 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 0.45200 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 0.46800 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 0.01600 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 0.01600 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
[ MinGW C++ compiler v5.1.0 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.12400 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.12500 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.01600 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.03100 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.03200 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.03100 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.03200 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.03100 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.03200 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.01500 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.01500 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.12400 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.03200 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.03100 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.03100 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.03200 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.03100 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.01600 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.03100 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
[ MinGW C++ compiler v5.1.0 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 0.24900 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 0.23400 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 0.25000 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 0.24900 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 0.25000 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 0.24900 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.25000 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.25000 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.24900 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.25000 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 0.24900 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 0.25000 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 0.25000 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 0.24900 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 0.25000 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 0.24900 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 0.21900 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit ] ... Data Set Size : 16777216 elements ( 4096 x 4096 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 11.31000 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 12.38700 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 13.96200 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 13.52500 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 13.71200 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 13.58800 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 14.22700 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 11.85600 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 13.33900 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 13.46200 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 12.71400 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 14.64900 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 14.53900 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 15.07000 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 14.24200 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 11.74700 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 13.04200 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 12.60500 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 11.57500 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 11.66900 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 14.35200 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 14.72700 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 14.69500 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 13.46300 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 14.66400 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 14.83600 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 14.13300 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 13.88400 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 14.35200 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 14.16500 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ] ... Data Set Size : 16777216 elements ( 4096 x 4096 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 16.39500 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 15.28800 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 14.74300 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 14.43000 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 14.38300 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 8.44000 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 8.33000 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 8.93900 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 9.03200 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 10.07800 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 6.49000 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 6.69200 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 5.33500 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 5.22600 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 5.22600 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 15.56900 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 13.79100 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 16.50500 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 13.75900 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 13.90000 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 9.37500 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 9.68800 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 7.89400 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 9.29700 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 9.23600 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 5.42800 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 5.33600 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 5.24100 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 5.32000 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 5.28800 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
[ MinGW C++ compiler v5.1.0 64-bit ] ... Data Set Size : 16777216 elements ( 4096 x 4096 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 21.66900 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 20.06200 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 20.71700 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 21.38700 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 21.45100 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 10.28000 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 9.23500 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 10.35900 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 8.90700 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 8.81400 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 9.76600 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 10.10900 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 10.04600 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 9.64100 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 10.42100 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 9.39100 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 10.26500 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 9.57800 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 8.50200 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 9.71900 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 9.14200 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 10.24900 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 10.21800 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 10.10900 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 10.51400 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 21.15400 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 18.65800 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 20.17100 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 21.69900 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 20.17100 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 10.32800 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 9.59400 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 9.64000 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 9.71900 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 10.25000 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 9.76500 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 10.34300 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 10.01500 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 10.07800 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 9.68800 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
>>... >>_MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs >>_MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs >>_MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs >>_MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs >>_MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs >>... Note: A function is Not supported for all cases where results are zeros.
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
Finally upgraded MinGW C++ compiler to 6.1.0 version ( from 5.1.0 version ) and preliminary tests show performance improvement from 10% to 20% for some algorithms.
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
[ MinGW C++ compiler v6.1.0 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.06200 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.01600 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.01500 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.01500 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.01600 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.01600 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.01500 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.01500 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.01600 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.01500 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.01600 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.01500 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.01600 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.01600 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.01500 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.01600 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.01500 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.03200 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.06200 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.06200 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.06300 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.01500 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.01600 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.01500 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.01600 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.01600 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.01500 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.01500 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.01600 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
[ MinGW C++ compiler v6.1.0 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 0.06200 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 0.09400 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 0.06200 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 0.06300 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 0.09300 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.07800 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.09400 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.09400 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 0.09300 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 0.09400 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 0.06200 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 0.09400 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.07800 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.09300 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 0.09300 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 0.09300 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 0.09400 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 0.06200 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 0.09400 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 0.09400 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
[ MinGW C++ compiler v6.1.0 64-bit ] ... Data Set Size : 1048576 elements ( 1024 x 1024 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.29600 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.28100 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.28100 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.26500 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.28100 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.07800 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.09400 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.09300 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.06200 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.06200 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.06200 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.09400 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.06200 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.10900 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.09400 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.09400 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.29600 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.26500 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.26500 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.28100 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.26500 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.06200 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.06300 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.10900 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.09300 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.09400 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.09400 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.10900 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.09300 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
[ MinGW C++ compiler v6.1.0 64-bit ] ... Data Set Size : 4194304 elements ( 2048 x 2048 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 2.09000 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 2.13800 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 2.07400 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 2.12200 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 2.13700 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.79600 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.82700 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.76400 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.76400 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.74900 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.74900 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.74900 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.74900 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.74900 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.78000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.81100 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.79500 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.78000 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.79600 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.79600 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.78000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.79500 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.79600 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.78000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.79500 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 2.01300 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 1.99700 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 2.04300 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 1.99700 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 2.01200 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.76500 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.73300 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.74900 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.74900 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.73300 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.78000 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.78000 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.79500 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.81200 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.78000 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views
[ MinGW C++ compiler v6.1.0 64-bit ] ... Data Set Size : 16777216 elements ( 4096 x 4096 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 17.12900 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 16.77000 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 18.84500 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 16.94200 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 17.06600 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 7.11400 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 7.98700 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 7.33200 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 7.22300 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 6.92700 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 7.27000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 7.14400 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 7.13000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 7.30100 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 7.05100 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 6.83300 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 7.42500 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 8.61100 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 9.01700 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 8.08100 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 6.77000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 8.47100 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 7.17600 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 7.31600 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 7.59700 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 16.72400 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 16.73900 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 16.80100 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 16.61400 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 16.87900 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 8.19000 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 7.87800 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 7.92500 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 8.37800 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 8.01800 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 7.14500 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 6.89500 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 9.98400 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 7.67500 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 7.95600 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,805 Views

>>Finally upgraded MinGW C++ compiler to 6.1.0 version ( from 5.1.0 version ) and preliminary tests show performance

>>improvement from 10% to 20% for some algorithms.

                                                 MinGW v5.1.0   MinGW v6.1.0
...
_MatrixMulProcessingCTv1B - Pass 01 - Completed: 10.28000 secs  7.11400 secs
_MatrixMulProcessingCTv1B - Pass 02 - Completed:  9.23500 secs  7.98700 secs
_MatrixMulProcessingCTv1B - Pass 03 - Completed: 10.35900 secs  7.33200 secs
_MatrixMulProcessingCTv1B - Pass 04 - Completed:  8.90700 secs  7.22300 secs
_MatrixMulProcessingCTv1B - Pass 05 - Completed:  8.81400 secs  6.92700 secs
...

 

0 Kudos
SergeyKostrov
Valued Contributor II
1,814 Views
[ MinGW C++ compiler v6.1.0 32-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.43800 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.43700 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.43800 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.42200 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.42100 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.36000 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.35900 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.34400 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.35900 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.34400 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.35900 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.34400 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.36000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.34300 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.36000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.37500 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.34300 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.34400 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.36000 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.35900 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.35900 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.34400 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.35900 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.39100 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.35900 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.45400 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.43700 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.51600 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.48400 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.42200 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.34300 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.36000 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.35900 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.34400 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.35900 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.34400 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.34400 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.34300 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.36000 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.34400 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,814 Views
[ MinGW C++ compiler v6.1.0 32-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 11.07800 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 11.07800 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 11.53100 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 11.40600 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 11.09400 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 10.90600 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 10.90600 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 10.89100 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 10.92200 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 10.87500 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 10.90600 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 10.90600 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 10.92200 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 10.92200 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 10.92200 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 13.78100 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 13.78100 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 11.20400 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 11.20300 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 11.20300 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 11.20300 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 11.20300 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 11.20300 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 11.20300 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 11.20300 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 11.06300 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 11.07800 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 11.06300 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 11.06200 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 11.04700 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 10.85900 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 10.86000 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 10.87500 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 10.87500 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 10.85900 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 11.18800 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 11.18700 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 11.18800 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 11.18700 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 11.18800 secs
0 Kudos
SergeyKostrov
Valued Contributor II
1,814 Views
[ Performance improvements - MinGW v5.1.0 vs. MinGW v6.1.0 ] Preliminary tests show performance improvement from 10% to 20% for some algorithms. ....................................................................................MinGW v5.1.0.....................MinGW v6.1.0 ... _MatrixMulProcessingCTv1B - Pass 01 - Completed: 10.28000 secs......................7.11400 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 9.23500 secs.......................7.98700 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 10.35900 secs......................7.33200 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 8.90700 secs.......................7.22300 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 8.81400 secs.......................6.92700 secs ...
0 Kudos
SergeyKostrov
Valued Contributor II
1,814 Views
[ MinGW C++ compiler v6.1.0 - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 98.61001 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 100.79601 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 98.87501 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 99.09300 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 3.43800 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.04700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 5.78100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 5.90600 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 100.14200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 100.12201 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,814 Views
[ MinGW C++ compiler v6.1.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.78100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 2.78100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 6.98400 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 7.00000 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 98.70300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 98.71800 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 99.62501 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 99.64101 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.92200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 2.90600 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,814 Views
[ MinGW C++ compiler v6.1.0 - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 8.93900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 8.98500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 9.09500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 9.42200 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.98300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.17000 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1.45100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.21700 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 8.67400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 8.67300 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,814 Views
[ MinGW C++ compiler v6.1.0 - Release - 64-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.25000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.26500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.54600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.56200 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 9.42300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.42200 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9.73400 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.70400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.29600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.34300 secs > Test1099 End < Tests: Completed
0 Kudos
Reply