Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

Performance Evaluation of Classic Matrix Multiplication algorithms

SergeyKostrov
Valued Contributor II
15,063 Views
*** Performance Evaluation of Classic Matrix Multiplication algorithms *** [ Abstract ] This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
0 Kudos
1 Solution
zalia64
New Contributor I
14,966 Views

You are. right.

I have missed the one-letter difference in the title.

For simple readers like me, fundamental one-letter differences must be spelled out explicitly.

 

View solution in original post

0 Kudos
146 Replies
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 4.93700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 5.00000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 5.73500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 5.73400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 97.76500 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 97.78100 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 99.50000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 99.34400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 4.96900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 4.98400 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU IB 32-bit Windows 7 ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 9.64100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 9.06300 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 10.10900 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 9.32900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.48400 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.21700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.67100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.18500 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 9.51600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 9.78100 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU IB 32-bit Windows 7 ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 1.21700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 1.15500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.04500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 1.06100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 9.07900 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.12600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9.51600 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.53200 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 1.13900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 1.15400 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS2005 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 98.85900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 104.04700 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 98.98500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 99.54700 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 3.34400 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 7.39100 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 5.75000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 7.40600 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 21.48400 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 21.28100 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 21.51600 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 21.73400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 105.59301 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 105.75001 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS2005 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 4.06200 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 3.93800 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 5.76500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 5.76600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 100.07800 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 99.23400 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 100.84300 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 99.98400 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 21.48500 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 21.28100 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 21.51500 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 21.73400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 3.85900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 3.86000 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS2008 PE ) - Release - 32-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 24.78900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 23.74300 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 25.53800 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 24.61700 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 1.20100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.39900 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1.98100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.57100 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 1.81000 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.26200 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 1.82500 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 2.68300 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 23.68000 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 23.66600 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS2008 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.05900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 2.02800 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 2.46500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 2.44900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 24.22700 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 24.19600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 24.82000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 24.80400 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 1.80900 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.29400 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 1.82500 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 2.69900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.02800 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 2.04400 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS2008 PE ) - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - ScaLibTestApp - WIN64_MSC ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 24.96000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 23.88300 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 24.66400 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 24.72600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 1.21700 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.38400 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1.95000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.50900 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 1.79400 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.26200 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 1.80900 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 2.44900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 23.88400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 23.85300 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS2008 PE ) - Release - 64-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - ScaLibTestApp - WIN64_MSC ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.02800 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 2.02800 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 2.43400 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 2.43400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 24.38300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 24.38300 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 25.02200 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 24.99100 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 1.79400 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.26200 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 1.81000 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 2.44900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.02800 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 2.02800 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS2008 EE ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 98.45300 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 103.62500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 98.64100 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 99.03100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 3.20300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 7.35900 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 5.46800 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 7.50000 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 12.81200 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 12.96800 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 12.85900 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 13.25000 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 105.09400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 105.12500 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Microsoft C++ compiler ( VS2008 EE ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 4.03100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 3.92200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 5.61000 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 5.60900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 98.39100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 98.36000 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 99.29700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 99.29700 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 12.81300 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 12.96900 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 12.86000 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 13.25000 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 3.84300 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 3.84400 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Borland C++ compiler v5.5.1 - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 99.11000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 117.65600 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 99.14100 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 99.90600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 5.01500 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 19.68700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 5.89000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 19.54700 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 109.17200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 102.87500 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Borland C++ compiler v5.5.1 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 5.56200 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 6.14100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 6.06300 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 6.54600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 100.85900 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 101.43700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 100.96900 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 101.46900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 14.54700 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 14.95300 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Borland C++ compiler v5.5.1 - Release - 32-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 10.20300 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 9.36000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 9.95300 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 9.60900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.98300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 3.82200 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.99800 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.18100 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 9.14100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 8.93900 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Borland C++ compiler v5.5.1 - Release - 32-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 1.01400 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 1.20100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.09200 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 1.26400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 9.12600 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.32900 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9.40700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 10.14000 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.34000 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 2.37100 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Intel C++ compiler v12.1.7 ( u371 ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 1.64000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.67200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 3.14000 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 2.82900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 3.00000 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.43700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 5.96800 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 3.51500 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 10.98400 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 11.42200 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 11.21800 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 12.09400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.67200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 25.11000 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Intel C++ compiler v12.1.7 ( u371 ) - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.86000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.67200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 2.76600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 2.89000 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 1.50000 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 1.51500 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 3.56300 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 3.51600 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 11.06300 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 11.40600 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 11.29700 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 12.26500 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.67200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 2.57800 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Intel C++ compiler v13.1.0 ( u149 ) - Release - 32-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.42100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.04700 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.02900 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.99800 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 1.37300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.42100 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1.62200 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.64000 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 2.93200 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.93300 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 2.96400 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 4.43100 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.03100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 25.33500 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,975 Views
[ Intel C++ compiler v13.1.0 ( u149 ) - Release - 32-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.03100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.12500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.09200 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.98300 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.42100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.43700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.64000 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.64000 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 2.93300 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.93300 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 2.94900 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 4.41500 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.03100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.74900 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,973 Views
[ Intel C++ compiler v13.1.0 ( u149 ) - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] Application - IccTestApp - WIN64_ICC ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IJK Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.28100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.03100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.10800 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.98300 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 1.37300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.32700 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1.60700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.54600 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 2.91700 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.93300 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 2.93300 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 4.38300 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.04700 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 24.72600 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,973 Views
[ Intel C++ compiler v13.1.0 ( u149 ) - Release - 64-bit ( LPS: IKJ ) - CPU IB 64-bit Windows 7 ] Application - IccTestApp - WIN64_ICC ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.18700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.14000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.07600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.99900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.31200 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.31200 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.54600 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 0.54600 secs Sub-Test 3.1 - MxMultC1 - Classic 2D SSE2 Transposed v1 LBOT size: N/A Completed: 2.91700 secs Sub-Test 3.2 - MxMultC2 - Classic 2D SSE2 Transposed v1 LBOT LBOT size: 1024x1024 elements Completed: 2.91700 secs Sub-Test 4.1 - MatrixMulEx1 - Classic 2D SSE2 Transposed v2 LBOT size: N/A Completed: 2.93300 secs Sub-Test 4.2 - MatrixMulEx2 - Classic 2D SSE2 Transposed v2 LBOT LBOT size: 1024x1024 elements Completed: 4.39900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.03100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.60900 secs > Test1099 End < Tests: Completed
0 Kudos
Reply