- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** Performance Evaluation of Classic Matrix Multiplication algorithms ***
[ Abstract ]
This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are. right.
I have missed the one-letter difference in the title.
For simple readers like me, fundamental one-letter differences must be spelled out explicitly.
Link Copied
146 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm. The list of different versions of the algorithm is as follows:
Classic 2D
Classic 2D LBOT
Classic 2D Fused
Classic 2D Fused LBOT
Classic 2D Transposed
Classic 2D Transposed LBOT
Classic 2D Fused Transposed
Classic 2D Fused Transposed LBOT
Classic 2D SSE2 Transposed v1
Classic 2D SSE2 Transposed v1 LBOT
Classic 2D SSE2 Transposed v2
Classic 2D SSE2 Transposed v2 LBOT
Classic 1D
Classic 1D LBOT
Two sub-versions of each version of the algorithm is evaluated with:
- Loop Processing Schema IJK
- Loop Processing Schema IKJ ( aka Loop Interchange technique )
Performance evaluations are done:
(1) On four computer systems:
Dell Precision Mobile M4700
Dell Dimension 4400
Dell Latitude CPi D300XT
Acer Aspire One ( netbook )
(2) On four Operating Systems:
Windows 95 Pan European 32-bit
Windows 2000 Professional 32-bit SP4
Windows XP Professional 32-bit SP3
Windows 7 Professional 64-bit SP1
(3) With four IDEs:
Visual Studio 98 Professional Edition
Visual Studio 2005 Professional Edition
Visual Studio 2008 Professional Edition
Visual Studio 2008 Express Edition
(4) With twenty two C++ compilers:
Borland C++ compiler v5.5.1 32-bit
MinGW C++ compiler v3.4.2 32-bit
MinGW C++ compiler v4.8.1 32-bit
MinGW C++ compiler v4.9.2 32-bit
MinGW C++ compiler v4.9.2 64-bit
MinGW C++ compiler v5.1.0 32-bit
MinGW C++ compiler v5.1.0 64-bit
MinGW C++ compiler v6.1.0 32-bit
MinGW C++ compiler v6.1.0 64-bit
Microsoft C++ compiler ( VS98 PE ) 32-bit
Microsoft C++ compiler ( VS2005 PE ) 32-bit
Microsoft C++ compiler ( VS2008 PE ) 32-bit
Microsoft C++ compiler ( VS2008 PE ) 64-bit
Microsoft C++ compiler ( VS2008 EE ) 32-bit
Intel C++ compiler v7.1.0 ( u029 ) 32-bit
Intel C++ compiler v8.1.0 ( u038 ) 32-bit
Intel C++ compiler v12.1.7 ( u371 ) 32-bit
Intel C++ compiler v13.1.0 ( u149 ) 32-bit
Intel C++ compiler v13.1.0 ( u149 ) 64-bit
Watcom C++ compiler v1.9.0 32-bit
Watcom C++ compiler v2.0.0 32-bit
Watcom C++ compiler v2.0.0 64-bit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Watcom C++ compiler v2.0.0 64-bit ]
Even if the compiler and linker are ported to 64-bit platforms generated binary codes are still 32-bit!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ List of Abbreviations ]
MM - Matrix Multiplication
C - Classic
LPS - Loop Processing Schema
1D - One Dimensional Input Matrices
2D - Two Dimensional Input Matrices
LB - Loop Blocking ( OT )
LBOT - Loop Blocking Optimization Technique
F - Fused ( OT )
T - Transposed ( OT )
SSE2 - Streaming SIMD Extensions v2
OT - Optimization Technique
PE - Professional Edition ( of Visual Studio )
EE - Express Edition ( of Visual Studio )
P2 - Intel Pentium PII
P4 - Intel Pentim 4
IB - Intel Ivy Bridge
AN - Intel Atom N270
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Computer Systems used for performance evaluations ]
** Dell Precision Mobile M4700 **
Intel Core i7-3840QM ( 2.80 GHz )
Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846
32GB RAM
320GB HDD
NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory )
Windows 7 Professional 64-bit SP1
Size of L3 Cache = 8MB ( shared between all cores for data & instructions )
Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions )
Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions )
Display resolution: 1366 x 768
** Dell Dimension 4400 **
Intel Pentium 4 ( 1.60 GHz / 1 core )
1GB RAM
Seagate 20GB HDD ( * )
Seagate 3TB HDD ( ** )
EVGA GeForce 6200 Video Card 512MB DDR2 AGP 8x Video Card
Windows XP Professional 32-bit SP3
Size of L2 Cache = 256KB
Size of L1 Cache = 8KB
Display resolution: 1440 x 990
( * ) Seagate Barracuda 20GB IDE Hard Disk Drive
ST320011A
3.5" 7200 Rpm 2MB Cache IDE Ultra ATA100 / ATA-iV/6
Average Rotational Latency : 4.17 ms
Average Seek Times Read : 9.0ms
Average Seek Times Write : 10.0ms
Maximum Internal Transfer Rate : 69.4MB/sec
Average External Transfer Rate : 100MB/sec ( Read and Write )
Maximum External Transfer Rate : 150MB/sec ( Read )
Note: Barracuda ATA IV Family
( ** ) Seagate Barracuda 3TB IDE Hard Disk Drive
ST3000DM001
3.5" 7200 Rpm 64MB Cache SATA III ( 6GB/sec )
Average Rotational Latency : 4.16 ms
Average Seek Times Read : 8.5ms
Average Seek Times Write : 9.5ms
Maximum Internal Transfer Rate : 268MB/sec
Average External Transfer Rate : 156MB/sec ( Read and Write )
Maximum External Transfer Rate : 210MB/sec ( Read )
** Dell Latitude CPi D300XT **
Intel Pentium II ( 300 MHz / 1 core )
128MB RAM ( 2x64MB / MT8LDT864HG-6X 144-pin EDO SODIMM 60ns )
6GB HDD
Windows 2000 Professional 32-bit SP4
Size of L2 Cache = 512KB
Size of L1 Cache = 16KB
Display resolution: 1024 x 768
** Acer Aspire One **
Intel Atom N270 ( 1.60 GHz / 2 cores )
1.5GB RAM
CF to ZIF 1.8" HDD SSD IDE Adapter
2GB Compact Flash ( CF ) Card
Windows 95 Pan European 32-bit
Size of L2 Cache = 512KB
Size of L1 Cache = 24KB
Display resolution: 800 x 600
// Memory Settings in System.ini
...
[386Enh]
;
; MaxPhysPage value ; Amount of physical RAM Windows 95 can access
;
MaxPhysPage=32768 ; 823336 KB = 804 MB ( Currently Used )
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ OSs used for performance evaluations ]
Windows 95 Pan European 32-bit
Windows 2000 Professional 32-bit SP4
Windows XP Professional 32-bit SP3
Windows 7 Professional 64-bit SP1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ IDEs used for performance evaluations ]
Visual Studio 98 Professional Edition
Visual Studio 2005 Professional Edition
Visual Studio 2008 Professional Edition
Visual Studio 2008 Express Edition
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ C++ compilers used for performance evaluations ]
Borland C++ compiler v5.5.1 32-bit
MinGW C++ compiler v3.4.2 32-bit
MinGW C++ compiler v4.8.1 32-bit
MinGW C++ compiler v4.9.2 32-bit
MinGW C++ compiler v4.9.2 64-bit
MinGW C++ compiler v5.1.0 32-bit
MinGW C++ compiler v5.1.0 64-bit
MinGW C++ compiler v6.1.0 32-bit
MinGW C++ compiler v6.1.0 64-bit
Microsoft C++ compiler ( VS98 PE ) 32-bit
Microsoft C++ compiler ( VS2005 PE ) 32-bit
Microsoft C++ compiler ( VS2008 PE ) 32-bit
Microsoft C++ compiler ( VS2008 PE ) 64-bit
Microsoft C++ compiler ( VS2008 EE ) 32-bit
Intel C++ compiler v7.1.0 ( u029 ) 32-bit
Intel C++ compiler v8.1.0 ( u038 ) 32-bit
Intel C++ compiler v12.1.7 ( u371 ) 32-bit
Intel C++ compiler v13.1.0 ( u149 ) 32-bit
Intel C++ compiler v13.1.0 ( u149 ) 64-bit
Watcom C++ compiler v1.9.0 32-bit
Watcom C++ compiler v2.0.0 32-bit
Watcom C++ compiler v2.0.0 64-bit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Base Performance Evaluations with MKL SGEMM function - CPU AN 32-bit Windows 95 ]
It is Not completed because an MKL library installation for the platform is No longer available
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Base Performance Evaluations with MKL SGEMM function - CPU P2 32-bit Windows 2000 ]
It is Not completed because an MKL library installation for the platform is No longer available
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Base Performance Evaluations with MKL SGEMM function - CPU P4 32-bit Windows XP ]
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Tests: Start
> Test1153 Start <
Sub-Test 1.1 - Runtime Binding of MKL functions
Dynamic Library mkl_rt.dll Loaded
Initialization Done
Sub-Test 3.2 - MKL Matrix Multiplication
Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
Allocating Memory for Matrices ( 16-byte alignment )
Intializing Matrix Data - Started
Intializing Matrix Data - Completed
Cblas xGEMM
Matrix Size : 1024 x 1024
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.53100 secs
Cblas SGEMM - Pass 02 - Completed: 0.51600 secs
Cblas SGEMM - Pass 03 - Completed: 0.51600 secs
Cblas SGEMM - Pass 04 - Completed: 0.51600 secs
Cblas SGEMM - Pass 05 - Completed: 0.51500 secs
Cblas SGEMM - Passed
Deallocating Memory
Dynamic Library mkl_rt.dll Unloaded
> Test1153 End <
Tests: Completed
Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release
Tests: Start
> Test1153 Start <
Sub-Test 1.1 - Runtime Binding of MKL functions
Dynamic Library mkl_rt.dll Loaded
Initialization Done
Sub-Test 3.2 - MKL Matrix Multiplication
Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
Allocating Memory for Matrices ( 16-byte alignment )
Intializing Matrix Data - Started
Intializing Matrix Data - Completed
Cblas xGEMM
Matrix Size : 1024 x 1024
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.51500 secs
Cblas SGEMM - Pass 02 - Completed: 0.51500 secs
Cblas SGEMM - Pass 03 - Completed: 0.51600 secs
Cblas SGEMM - Pass 04 - Completed: 0.51600 secs
Cblas SGEMM - Pass 05 - Completed: 0.51600 secs
Cblas SGEMM - Passed
Deallocating Memory
Dynamic Library mkl_rt.dll Unloaded
> Test1153 End <
Tests: Completed
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1153 Start <
Sub-Test 1.1 - Runtime Binding of MKL functions
Dynamic Library mkl_rt.dll Loaded
Initialization Done
Sub-Test 3.2 - MKL Matrix Multiplication
Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
Allocating Memory for Matrices ( 16-byte alignment )
Intializing Matrix Data - Started
Intializing Matrix Data - Completed
Cblas xGEMM
Matrix Size : 1024 x 1024
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.53200 secs
Cblas SGEMM - Pass 02 - Completed: 0.51500 secs
Cblas SGEMM - Pass 03 - Completed: 0.51600 secs
Cblas SGEMM - Pass 04 - Completed: 0.51600 secs
Cblas SGEMM - Pass 05 - Completed: 0.51500 secs
Cblas SGEMM - Passed
Deallocating Memory
Dynamic Library mkl_rt.dll Unloaded
> Test1153 End <
Tests: Completed
Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release
Tests: Start
> Test1153 Start <
Sub-Test 1.1 - Runtime Binding of MKL functions
Dynamic Library mkl_rt.dll Loaded
Initialization Done
Sub-Test 3.2 - MKL Matrix Multiplication
Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
Allocating Memory for Matrices ( 16-byte alignment )
Intializing Matrix Data - Started
Intializing Matrix Data - Completed
Cblas xGEMM
Matrix Size : 1024 x 1024
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.54700 secs
Cblas SGEMM - Pass 02 - Completed: 0.51500 secs
Cblas SGEMM - Pass 03 - Completed: 0.51600 secs
Cblas SGEMM - Pass 04 - Completed: 0.51500 secs
Cblas SGEMM - Pass 05 - Completed: 0.51500 secs
Cblas SGEMM - Passed
Deallocating Memory
Dynamic Library mkl_rt.dll Unloaded
> Test1153 End <
Tests: Completed
Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release
Tests: Start
> Test1153 Start <
Sub-Test 1.1 - Runtime Binding of MKL functions
Dynamic Library mkl_rt.dll Loaded
Initialization Done
Sub-Test 3.2 - MKL Matrix Multiplication
Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
Allocating Memory for Matrices ( 16-byte alignment )
Intializing Matrix Data - Started
Intializing Matrix Data - Completed
Cblas xGEMM
Matrix Size : 1024 x 1024
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.54900 secs
Cblas SGEMM - Pass 02 - Completed: 0.51600 secs
Cblas SGEMM - Pass 03 - Completed: 0.51500 secs
Cblas SGEMM - Pass 04 - Completed: 0.51500 secs
Cblas SGEMM - Pass 05 - Completed: 0.51600 secs
Cblas SGEMM - Passed
Deallocating Memory
Dynamic Library mkl_rt.dll Unloaded
> Test1153 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Base Performance Evaluations with MKL SGEMM function - CPU IB 64-bit Windows 7 ]
Application - ScaLibTestApp - WIN64_MSC ( 64-bit ) - Release
Tests: Start
> Test1153 Start <
Sub-Test 1.1 - Runtime Binding of MKL functions
Dynamic Library mkl_rt.dll Loaded
Initialization Done
Sub-Test 3.2 - MKL Matrix Multiplication
Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
Allocating Memory for Matrices ( 16-byte alignment )
Intializing Matrix Data - Started
Intializing Matrix Data - Completed
Cblas xGEMM
Matrix Size : 1024 x 1024
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.06100 secs
Cblas SGEMM - Pass 02 - Completed: 0.06600 secs
Cblas SGEMM - Pass 03 - Completed: 0.06600 secs
Cblas SGEMM - Pass 04 - Completed: 0.06600 secs
Cblas SGEMM - Pass 05 - Completed: 0.06500 secs
Cblas SGEMM - Passed
Deallocating Memory
Dynamic Library mkl_rt.dll Unloaded
> Test1153 End <
Tests: Completed
Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release
Tests: Start
> Test1153 Start <
Sub-Test 1.1 - Runtime Binding of MKL functions
Dynamic Library mkl_rt.dll Loaded
Initialization Done
Sub-Test 3.2 - MKL Matrix Multiplication
Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
Allocating Memory for Matrices ( 16-byte alignment )
Intializing Matrix Data - Started
Intializing Matrix Data - Completed
Cblas xGEMM
Matrix Size : 1024 x 1024
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.06500 secs
Cblas SGEMM - Pass 02 - Completed: 0.06500 secs
Cblas SGEMM - Pass 03 - Completed: 0.06600 secs
Cblas SGEMM - Pass 04 - Completed: 0.06600 secs
Cblas SGEMM - Pass 05 - Completed: 0.06600 secs
Cblas SGEMM - Passed
Deallocating Memory
Dynamic Library mkl_rt.dll Unloaded
> Test1153 End <
Tests: Completed
Application - IccTestApp - WIN64_ICC ( 64-bit ) - Release
Tests: Start
> Test1153 Start <
Sub-Test 1.1 - Runtime Binding of MKL functions
Dynamic Library mkl_rt.dll Loaded
Initialization Done
Sub-Test 3.2 - MKL Matrix Multiplication
Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
Allocating Memory for Matrices ( 16-byte alignment )
Intializing Matrix Data - Started
Intializing Matrix Data - Completed
Cblas xGEMM
Matrix Size : 1024 x 1024
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.06200 secs
Cblas SGEMM - Pass 02 - Completed: 0.06500 secs
Cblas SGEMM - Pass 03 - Completed: 0.06600 secs
Cblas SGEMM - Pass 04 - Completed: 0.06600 secs
Cblas SGEMM - Pass 05 - Completed: 0.06500 secs
Cblas SGEMM - Passed
Deallocating Memory
Dynamic Library mkl_rt.dll Unloaded
> Test1153 End <
Tests: Completed
Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release
Tests: Start
> Test1153 Start <
Sub-Test 1.1 - Runtime Binding of MKL functions
Dynamic Library mkl_rt.dll Loaded
Initialization Done
Sub-Test 3.2 - MKL Matrix Multiplication
Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
Allocating Memory for Matrices ( 16-byte alignment )
Intializing Matrix Data - Started
Intializing Matrix Data - Completed
Cblas xGEMM
Matrix Size : 1024 x 1024
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.06700 secs
Cblas SGEMM - Pass 02 - Completed: 0.06500 secs
Cblas SGEMM - Pass 03 - Completed: 0.06600 secs
Cblas SGEMM - Pass 04 - Completed: 0.06500 secs
Cblas SGEMM - Pass 05 - Completed: 0.06500 secs
Cblas SGEMM - Passed
Deallocating Memory
Dynamic Library mkl_rt.dll Unloaded
> Test1153 End <
Tests: Completed
Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release
Tests: Start
> Test1153 Start <
Sub-Test 1.1 - Runtime Binding of MKL functions
Dynamic Library mkl_rt.dll Loaded
Initialization Done
Sub-Test 3.2 - MKL Matrix Multiplication
Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
Allocating Memory for Matrices ( 16-byte alignment )
Intializing Matrix Data - Started
Intializing Matrix Data - Completed
Cblas xGEMM
Matrix Size : 1024 x 1024
Matrix Size Threshold : N/A
Matrix Partitions : N/A
Degree of Recursion : N/A
Result Sets Reflection: N/A
Calculating...
Cblas SGEMM - Pass 01 - Completed: 0.06900 secs
Cblas SGEMM - Pass 02 - Completed: 0.06600 secs
Cblas SGEMM - Pass 03 - Completed: 0.06500 secs
Cblas SGEMM - Pass 04 - Completed: 0.06500 secs
Cblas SGEMM - Pass 05 - Completed: 0.06600 secs
Cblas SGEMM - Passed
Deallocating Memory
Dynamic Library mkl_rt.dll Unloaded
> Test1153 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU AN 32-bit Windows 95 ]
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IJK
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 140.56801 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 136.45601 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 145.31301 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 142.82801 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 5.08100 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 5.31400 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 5.61700 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 5.94600 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 136.55101 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 136.57901 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU AN 32-bit Windows 95 ]
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 9.87500 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 9.44900 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 9.73700 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 9.75100 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 147.64801 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 147.68901 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 146.48101 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 154.74801 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 9.44800 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 9.46300 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P2 32-bit Windows 2000 ]
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IJK
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 253.86501 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 253.85501 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 256.85901 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 257.74001 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 48.61000 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 59.95600 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 72.07300 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 72.43400 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 258.42101 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 258.35201 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v7.1.0 ( u029 ) - Release - 32-bit ( LPS: IJK ) - CPU P2 32-bit Windows 2000 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IJK
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 254.23501 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 281.93501 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 254.79601 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 255.33701 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 47.97900 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 60.25600 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 72.31400 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 72.74500 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 272.31201 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 273.65301 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P2 32-bit Windows 2000 ]
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 59.51500 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 59.54500 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 98.13100 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 98.14100 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 254.30601 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 254.62601 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 256.21801 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 255.96901 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 59.69600 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 59.68600 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v7.1.0 ( u029 ) - Release - 32-bit ( LPS: IKJ ) - CPU P2 32-bit Windows 2000 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 60.21600 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 59.84600 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 72.53500 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 72.52500 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 254.90701 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 254.93701 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 256.24901 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 256.48901 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 59.45600 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 59.48500 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v8.1.0 ( u038 ) - Release - 32-bit ( LPS: IJK ) - CPU P2 32-bit Windows 2000 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IJK
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 253.37400 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 253.12400 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 254.65600 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 255.29700 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 47.44800 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 48.89000 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 72.33400 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 72.35400 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 249.90900 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 249.90900 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v8.1.0 ( u038 ) - Release - 32-bit ( LPS: IKJ ) - CPU P2 32-bit Windows 2000 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 60.24600 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 59.78600 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 79.53400 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 79.54400 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 253.84500 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 254.01600 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 255.91800 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 255.87800 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 59.30500 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 59.29500 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ]
Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 1024 x 1024
Loop Processing Schema ( LPS ): IJK
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 97.57800 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 1024x1024 elements
Completed: 97.71800 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 97.85900 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 1024x1024 elements
Completed: 97.89000 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 3.18800 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 4.37500 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 5.45300 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 1024x1024 elements
Completed: 5.76600 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 97.70400 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 1024x1024 elements
Completed: 97.71800 secs
> Test1099 End <
Tests: Completed
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page