Software Archive
Read-only legacy content
17061 Discussions

Performance Evaluation of Classic Matrix Multiplication algorithms

SergeyKostrov
Valued Contributor II
8,668 Views
*** Performance Evaluation of Classic Matrix Multiplication algorithms *** [ Abstract ] This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
0 Kudos
1 Solution
zalia64
New Contributor I
8,571 Views

You are. right.

I have missed the one-letter difference in the title.

For simple readers like me, fundamental one-letter differences must be spelled out explicitly.

 

View solution in original post

0 Kudos
146 Replies
SergeyKostrov
Valued Contributor II
986 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 0.26500 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 0.25000 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 0.26500 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 0.26500 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 0.25000 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 0.45200 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 0.45200 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 0.45300 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 0.45200 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 0.46800 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 0.01600 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 0.01500 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 0.26600 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 0.26500 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 0.24900 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 0.26600 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 0.24900 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 0.46800 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 0.45300 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 0.45200 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 0.45200 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 0.46800 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 0.01600 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 0.01600 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
[ MinGW C++ compiler v5.1.0 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.12400 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.12500 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.01600 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.03100 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.03200 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.03100 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.03200 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.03100 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.03200 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.01500 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.01500 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.12500 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.12400 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.03200 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.03100 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.03100 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.03200 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.03100 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.03100 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.01600 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.03100 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
[ MinGW C++ compiler v5.1.0 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 0.24900 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 0.23400 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 0.25000 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 0.24900 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 0.25000 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 0.24900 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.25000 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.25000 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.24900 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.25000 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 0.24900 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 0.25000 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 0.25000 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 0.24900 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 0.25000 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 0.24900 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 0.21900 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit ] ... Data Set Size : 16777216 elements ( 4096 x 4096 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 11.31000 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 12.38700 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 13.96200 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 13.52500 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 13.71200 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 13.58800 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 14.22700 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 11.85600 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 13.33900 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 13.46200 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 12.71400 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 14.64900 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 14.53900 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 15.07000 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 14.24200 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 11.74700 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 13.04200 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 12.60500 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 11.57500 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 11.66900 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 14.35200 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 14.72700 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 14.69500 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 13.46300 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 14.66400 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 14.83600 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 14.13300 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 13.88400 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 14.35200 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 14.16500 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ] ... Data Set Size : 16777216 elements ( 4096 x 4096 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 16.39500 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 15.28800 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 14.74300 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 14.43000 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 14.38300 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 8.44000 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 8.33000 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 8.93900 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 9.03200 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 10.07800 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 6.49000 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 6.69200 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 5.33500 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 5.22600 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 5.22600 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 15.56900 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 13.79100 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 16.50500 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 13.75900 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 13.90000 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 9.37500 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 9.68800 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 7.89400 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 9.29700 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 9.23600 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 5.42800 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 5.33600 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 5.24100 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 5.32000 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 5.28800 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
[ MinGW C++ compiler v5.1.0 64-bit ] ... Data Set Size : 16777216 elements ( 4096 x 4096 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 21.66900 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 20.06200 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 20.71700 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 21.38700 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 21.45100 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 10.28000 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 9.23500 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 10.35900 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 8.90700 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 8.81400 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 9.76600 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 10.10900 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 10.04600 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 9.64100 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 10.42100 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 9.39100 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 10.26500 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 9.57800 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 8.50200 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 9.71900 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 9.14200 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 10.24900 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 10.21800 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 10.10900 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 10.51400 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 21.15400 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 18.65800 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 20.17100 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 21.69900 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 20.17100 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 10.32800 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 9.59400 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 9.64000 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 9.71900 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 10.25000 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 9.76500 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 10.34300 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 10.01500 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 10.07800 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 9.68800 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
>>... >>_MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs >>_MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs >>_MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs >>_MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs >>_MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs >>... Note: A function is Not supported for all cases where results are zeros.
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
Finally upgraded MinGW C++ compiler to 6.1.0 version ( from 5.1.0 version ) and preliminary tests show performance improvement from 10% to 20% for some algorithms.
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
[ MinGW C++ compiler v6.1.0 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.06200 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.01600 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.01500 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.01500 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.01600 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.01600 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.01500 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.01500 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.01600 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.01500 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.01600 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.01500 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.01600 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.01600 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.01500 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.01600 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.01500 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.03200 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.06200 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.06200 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.06300 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.01500 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.01600 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.01500 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.01600 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.01600 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.01500 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.01500 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.01600 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
[ MinGW C++ compiler v6.1.0 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 0.06200 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 0.09400 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 0.06200 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 0.06300 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 0.09300 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.07800 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.09400 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.09400 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 0.09300 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 0.09400 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 0.06200 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 0.09400 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.07800 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.09300 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 0.09300 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 0.09300 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 0.09400 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 0.06200 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 0.09400 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 0.09400 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
[ MinGW C++ compiler v6.1.0 64-bit ] ... Data Set Size : 1048576 elements ( 1024 x 1024 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.29600 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.28100 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.28100 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.26500 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.28100 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.07800 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.09400 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.09300 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.06200 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.06200 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.06200 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.09400 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.06200 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.10900 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.09400 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.09400 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.29600 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.26500 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.26500 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.28100 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.26500 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.06300 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.06200 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.06300 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.10900 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.09300 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.09400 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.09400 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.10900 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.09300 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
[ MinGW C++ compiler v6.1.0 64-bit ] ... Data Set Size : 4194304 elements ( 2048 x 2048 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 2.09000 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 2.13800 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 2.07400 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 2.12200 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 2.13700 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.79600 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.82700 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.76400 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.76400 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.74900 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.74900 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.74900 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.74900 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.74900 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.78000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.81100 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.79500 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.78000 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.79600 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.79600 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.78000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.79500 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.79600 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.78000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.79500 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 2.01300 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 1.99700 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 2.04300 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 1.99700 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 2.01200 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.76500 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.73300 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.74900 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.74900 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.73300 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.78000 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.78000 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.79500 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.81200 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.78000 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views
[ MinGW C++ compiler v6.1.0 64-bit ] ... Data Set Size : 16777216 elements ( 4096 x 4096 ) Number of Tests : 5 Number of Threads : 4 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 17.12900 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 16.77000 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 18.84500 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 16.94200 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 17.06600 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 7.11400 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 7.98700 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 7.33200 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 7.22300 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 6.92700 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 7.27000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 7.14400 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 7.13000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 7.30100 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 7.05100 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 6.83300 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 7.42500 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 8.61100 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 9.01700 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 8.08100 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 6.77000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 8.47100 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 7.17600 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 7.31600 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 7.59700 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 16.72400 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 16.73900 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 16.80100 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 16.61400 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 16.87900 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 8.19000 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 7.87800 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 7.92500 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 8.37800 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 8.01800 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 7.14500 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 6.89500 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 9.98400 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 7.67500 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 7.95600 secs
0 Kudos
SergeyKostrov
Valued Contributor II
986 Views

>>Finally upgraded MinGW C++ compiler to 6.1.0 version ( from 5.1.0 version ) and preliminary tests show performance

>>improvement from 10% to 20% for some algorithms.

                                                 MinGW v5.1.0   MinGW v6.1.0
...
_MatrixMulProcessingCTv1B - Pass 01 - Completed: 10.28000 secs  7.11400 secs
_MatrixMulProcessingCTv1B - Pass 02 - Completed:  9.23500 secs  7.98700 secs
_MatrixMulProcessingCTv1B - Pass 03 - Completed: 10.35900 secs  7.33200 secs
_MatrixMulProcessingCTv1B - Pass 04 - Completed:  8.90700 secs  7.22300 secs
_MatrixMulProcessingCTv1B - Pass 05 - Completed:  8.81400 secs  6.92700 secs
...

 

0 Kudos
SergeyKostrov
Valued Contributor II
995 Views
[ MinGW C++ compiler v6.1.0 32-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.43800 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.43700 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.43800 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.42200 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.42100 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.36000 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.35900 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.34400 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.35900 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.34400 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.35900 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.34400 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.36000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.34300 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.36000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.37500 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.34300 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.34400 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.36000 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.35900 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.35900 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.34400 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.35900 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.39100 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.35900 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.45400 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.43700 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.51600 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.48400 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.42200 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.34300 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.36000 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.35900 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.34400 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.35900 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.34400 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.34400 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.34300 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.36000 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.34400 secs
0 Kudos
SergeyKostrov
Valued Contributor II
995 Views
[ MinGW C++ compiler v6.1.0 32-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 11.07800 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 11.07800 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 11.53100 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 11.40600 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 11.09400 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 10.90600 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 10.90600 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 10.89100 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 10.92200 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 10.87500 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 10.90600 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 10.90600 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 10.92200 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 10.92200 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 10.92200 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 13.78100 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 13.78100 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 11.20400 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 11.20300 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 11.20300 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 11.20300 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 11.20300 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 11.20300 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 11.20300 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 11.20300 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 11.06300 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 11.07800 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 11.06300 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 11.06200 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 11.04700 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 10.85900 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 10.86000 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 10.87500 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 10.87500 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 10.85900 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 11.18800 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 11.18700 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 11.18800 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 11.18700 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 11.18800 secs
0 Kudos
SergeyKostrov
Valued Contributor II
995 Views
[ Performance improvements - MinGW v5.1.0 vs. MinGW v6.1.0 ] Preliminary tests show performance improvement from 10% to 20% for some algorithms. ....................................................................................MinGW v5.1.0.....................MinGW v6.1.0 ... _MatrixMulProcessingCTv1B - Pass 01 - Completed: 10.28000 secs......................7.11400 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 9.23500 secs.......................7.98700 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 10.35900 secs......................7.33200 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 8.90700 secs.......................7.22300 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 8.81400 secs.......................6.92700 secs ...
0 Kudos
SergeyKostrov
Valued Contributor II
995 Views
[ MinGW C++ compiler v6.1.0 - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 98.61001 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 100.79601 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 98.87501 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 99.09300 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 3.43800 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.04700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 5.78100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 5.90600 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 100.14200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 100.12201 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
995 Views
[ MinGW C++ compiler v6.1.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.78100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 2.78100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 6.98400 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 7.00000 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 98.70300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 98.71800 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 99.62501 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 99.64101 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.92200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 2.90600 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
995 Views
[ MinGW C++ compiler v6.1.0 - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 8.93900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 8.98500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 9.09500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 9.42200 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.98300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.17000 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1.45100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.21700 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 8.67400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 8.67300 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
995 Views
[ MinGW C++ compiler v6.1.0 - Release - 64-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.25000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.26500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.54600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.56200 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 9.42300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.42200 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9.73400 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.70400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.29600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.34300 secs > Test1099 End < Tests: Completed
0 Kudos
Reply