Software Archive
Read-only legacy content
17061 Discussions

Performance Evaluation of Classic Matrix Multiplication algorithms

SergeyKostrov
Valued Contributor II
5,069 Views
*** Performance Evaluation of Classic Matrix Multiplication algorithms *** [ Abstract ] This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
0 Kudos
1 Solution
zalia64
New Contributor I
4,972 Views

You are. right.

I have missed the one-letter difference in the title.

For simple readers like me, fundamental one-letter differences must be spelled out explicitly.

 

View solution in original post

0 Kudos
146 Replies
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 4.93700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 5.00000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 5.73500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 5.73400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 97.76500 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 97.78100 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 99.50000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 99.34400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 4.96900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 4.98400 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU IB 32-bit Windows 7 ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 9.64100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 9.06300 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 10.10900 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 9.32900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.48400 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.21700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.67100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.18500 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 9.51600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 9.78100 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU IB 32-bit Windows 7 ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 1.21700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 1.15500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.04500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 1.06100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 9.07900 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.12600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9.51600 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.53200 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 1.13900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 1.15400 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS2005 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 98.85900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 104.04700 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 98.98500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 99.54700 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 3.34400 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 7.39100 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 5.75000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 7.40600 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 21.48400 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 21.28100 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 21.51600 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 21.73400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 105.59301 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 105.75001 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS2005 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 4.06200 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 3.93800 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 5.76500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 5.76600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 100.07800 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 99.23400 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 100.84300 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 99.98400 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 21.48500 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 21.28100 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 21.51500 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 21.73400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 3.85900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 3.86000 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS2008 PE ) - Release - 32-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 24.78900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 23.74300 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 25.53800 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 24.61700 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 1.20100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.39900 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1.98100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.57100 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 1.81000 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.26200 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 1.82500 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 2.68300 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 23.68000 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 23.66600 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS2008 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.05900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 2.02800 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 2.46500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 2.44900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 24.22700 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 24.19600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 24.82000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 24.80400 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 1.80900 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.29400 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 1.82500 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 2.69900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.02800 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 2.04400 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS2008 PE ) - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - ScaLibTestApp - WIN64_MSC ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 24.96000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 23.88300 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 24.66400 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 24.72600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 1.21700 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.38400 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1.95000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.50900 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 1.79400 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.26200 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 1.80900 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 2.44900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 23.88400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 23.85300 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS2008 PE ) - Release - 64-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - ScaLibTestApp - WIN64_MSC ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.02800 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 2.02800 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 2.43400 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 2.43400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 24.38300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 24.38300 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 25.02200 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 24.99100 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 1.79400 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.26200 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 1.81000 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 2.44900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.02800 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 2.02800 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS2008 EE ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 98.45300 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 103.62500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 98.64100 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 99.03100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 3.20300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 7.35900 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 5.46800 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 7.50000 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 12.81200 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 12.96800 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 12.85900 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 13.25000 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 105.09400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 105.12500 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Microsoft C++ compiler ( VS2008 EE ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 4.03100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 3.92200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 5.61000 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 5.60900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 98.39100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 98.36000 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 99.29700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 99.29700 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 12.81300 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 12.96900 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 12.86000 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 13.25000 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 3.84300 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 3.84400 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Borland C++ compiler v5.5.1 - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 99.11000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 117.65600 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 99.14100 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 99.90600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 5.01500 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 19.68700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 5.89000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 19.54700 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 109.17200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 102.87500 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Borland C++ compiler v5.5.1 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 5.56200 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 6.14100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 6.06300 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 6.54600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 100.85900 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 101.43700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 100.96900 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 101.46900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 14.54700 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 14.95300 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Borland C++ compiler v5.5.1 - Release - 32-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 10.20300 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 9.36000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 9.95300 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 9.60900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.98300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 3.82200 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.99800 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.18100 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 9.14100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 8.93900 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Borland C++ compiler v5.5.1 - Release - 32-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 1.01400 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 1.20100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.09200 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 1.26400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 9.12600 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.32900 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9.40700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 10.14000 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.34000 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 2.37100 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Intel C++ compiler v12.1.7 ( u371 ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 1.64000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.67200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 3.14000 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 2.82900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 3.00000 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.43700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 5.96800 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 3.51500 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 10.98400 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 11.42200 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 11.21800 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 12.09400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.67200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 25.11000 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Intel C++ compiler v12.1.7 ( u371 ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.86000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.67200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 2.76600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 2.89000 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 1.50000 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.51500 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 3.56300 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 3.51600 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 11.06300 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 11.40600 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 11.29700 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 12.26500 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.67200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 2.57800 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Intel C++ compiler v13.1.0 ( u149 ) - Release - 32-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.42100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.04700 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.02900 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.99800 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 1.37300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.42100 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1.62200 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.64000 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 2.93200 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.93300 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 2.96400 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 4.43100 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.03100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 25.33500 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
668 Views
[ Intel C++ compiler v13.1.0 ( u149 ) - Release - 32-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.03100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.12500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.09200 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.98300 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.42100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.43700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.64000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.64000 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 2.93300 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.93300 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 2.94900 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 4.41500 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.03100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.74900 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
666 Views
[ Intel C++ compiler v13.1.0 ( u149 ) - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - IccTestApp - WIN64_ICC ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.28100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.03100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.10800 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.98300 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 1.37300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.32700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1.60700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.54600 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 2.91700 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.93300 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 2.93300 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 4.38300 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.04700 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 24.72600 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
666 Views
[ Intel C++ compiler v13.1.0 ( u149 ) - Release - 64-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - IccTestApp - WIN64_ICC ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.18700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.14000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.07600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.99900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.31200 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.31200 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.54600 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.54600 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 2.91700 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.91700 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 2.93300 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 4.39900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.03100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.60900 secs > Test1099 End < Tests: Completed
0 Kudos
Reply