- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						*** Performance Evaluation of Classic Matrix Multiplication algorithms ***
[ Abstract ]
This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
		1 Solution
	
		
			- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are. right.
I have missed the one-letter difference in the title.
For simple readers like me, fundamental one-letter differences must be spelled out explicitly.
Link Copied
		146 Replies
	
		
		
			
			
			
					
	
			- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm. The list of different versions of the algorithm is as follows:
			Classic 2D
			Classic 2D LBOT
			Classic 2D Fused
			Classic 2D Fused LBOT
			Classic 2D Transposed
			Classic 2D Transposed LBOT
			Classic 2D Fused Transposed
			Classic 2D Fused Transposed LBOT
			Classic 2D SSE2 Transposed v1
			Classic 2D SSE2 Transposed v1 LBOT
			Classic 2D SSE2 Transposed v2
			Classic 2D SSE2 Transposed v2 LBOT
			Classic 1D
			Classic 1D LBOT
Two sub-versions of each version of the algorithm is evaluated with:
			- Loop Processing Schema IJK
			- Loop Processing Schema IKJ ( aka Loop Interchange technique )
Performance evaluations are done:
		(1) On four computer systems:
			Dell Precision Mobile M4700
			Dell Dimension 4400
			Dell Latitude CPi D300XT
			Acer Aspire One ( netbook )
		(2) On four Operating Systems:
			Windows 95 Pan European 32-bit
			Windows 2000 Professional 32-bit SP4
			Windows XP Professional 32-bit SP3
			Windows 7 Professional 64-bit SP1
		(3) With four IDEs:
			Visual Studio 98 Professional Edition
			Visual Studio 2005 Professional Edition
			Visual Studio 2008 Professional Edition
			Visual Studio 2008 Express Edition
		(4) With twenty two C++ compilers: 
			Borland C++ compiler v5.5.1 32-bit
			MinGW C++ compiler v3.4.2 32-bit
			MinGW C++ compiler v4.8.1 32-bit
			MinGW C++ compiler v4.9.2 32-bit
			MinGW C++ compiler v4.9.2 64-bit
			MinGW C++ compiler v5.1.0 32-bit
			MinGW C++ compiler v5.1.0 64-bit
			MinGW C++ compiler v6.1.0 32-bit
			MinGW C++ compiler v6.1.0 64-bit
			Microsoft C++ compiler ( VS98 PE   ) 32-bit
			Microsoft C++ compiler ( VS2005 PE ) 32-bit
			Microsoft C++ compiler ( VS2008 PE ) 32-bit
			Microsoft C++ compiler ( VS2008 PE ) 64-bit
			Microsoft C++ compiler ( VS2008 EE ) 32-bit
			Intel C++ compiler v7.1.0 ( u029 ) 32-bit
			Intel C++ compiler v8.1.0 ( u038 ) 32-bit
			Intel C++ compiler v12.1.7 ( u371 ) 32-bit
			Intel C++ compiler v13.1.0 ( u149 ) 32-bit
			Intel C++ compiler v13.1.0 ( u149 ) 64-bit
			Watcom C++ compiler v1.9.0 32-bit
			Watcom C++ compiler v2.0.0 32-bit
			Watcom C++ compiler v2.0.0 64-bit
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Watcom C++ compiler v2.0.0 64-bit ]
Even if the compiler and linker are ported to 64-bit platforms generated binary codes are still 32-bit!
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ List of Abbreviations ]
		MM   - Matrix Multiplication
		C    - Classic
		LPS  - Loop Processing Schema
		1D   - One Dimensional Input Matrices
		2D   - Two Dimensional Input Matrices
		LB   - Loop Blocking			( OT )
		LBOT - Loop Blocking Optimization Technique
		F    - Fused				( OT )
		T    - Transposed			( OT )
		SSE2 - Streaming SIMD Extensions v2
		OT   - Optimization Technique
		PE   - Professional Edition			( of Visual Studio )
		EE   - Express Edition			( of Visual Studio )
		P2   - Intel Pentium PII
		P4   - Intel Pentim 4
		IB   - Intel Ivy Bridge
		AN   - Intel Atom N270
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Computer Systems used for performance evaluations ]
** Dell Precision Mobile M4700 **
			Intel Core i7-3840QM ( 2.80 GHz )
			Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846
			32GB RAM
			320GB HDD
			NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory )
			Windows 7 Professional 64-bit SP1
			Size of L3 Cache =   8MB ( shared between all cores for data & instructions )
			Size of L2 Cache =   1MB ( 256KB per core / shared for data & instructions )
			Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions )
			Display resolution: 1366 x 768
** Dell Dimension 4400 **
			Intel Pentium 4 ( 1.60 GHz / 1 core )
			1GB RAM
			Seagate 20GB HDD						( *  )
			Seagate  3TB HDD						( ** )
			EVGA GeForce 6200 Video Card 512MB DDR2 AGP 8x Video Card
			Windows XP Professional 32-bit SP3
			Size of L2 Cache = 256KB
			Size of L1 Cache =   8KB
			Display resolution: 1440 x 990
			( *  )	Seagate Barracuda 20GB IDE Hard Disk Drive
			ST320011A
			3.5" 7200 Rpm  2MB Cache IDE Ultra ATA100 / ATA-iV/6
			Average Rotational Latency	: 4.17 ms
			Average Seek Times Read		: 9.0ms
			Average Seek Times Write	: 10.0ms
			Maximum Internal Transfer Rate	: 69.4MB/sec
			Average External Transfer Rate	: 100MB/sec ( Read and Write )
			Maximum External Transfer Rate	: 150MB/sec ( Read           )
			Note: Barracuda ATA IV Family
			( ** )	Seagate Barracuda  3TB IDE Hard Disk Drive
			ST3000DM001
			3.5" 7200 Rpm 64MB Cache SATA III ( 6GB/sec )
			Average Rotational Latency	: 4.16 ms
			Average Seek Times Read		: 8.5ms
			Average Seek Times Write	: 9.5ms
			Maximum Internal Transfer Rate	: 268MB/sec
			Average External Transfer Rate	: 156MB/sec ( Read and Write )
			Maximum External Transfer Rate	: 210MB/sec ( Read           )
** Dell Latitude CPi D300XT **
			Intel Pentium II ( 300 MHz / 1 core )
			128MB RAM ( 2x64MB / MT8LDT864HG-6X 144-pin EDO SODIMM 60ns )
			6GB HDD
			Windows 2000 Professional 32-bit SP4
			Size of L2 Cache = 512KB
			Size of L1 Cache =  16KB
			Display resolution: 1024 x 768
** Acer Aspire One **
			Intel Atom N270 ( 1.60 GHz / 2 cores )
			1.5GB RAM
			CF to ZIF 1.8" HDD SSD IDE Adapter
			2GB Compact Flash ( CF ) Card
			Windows 95 Pan European 32-bit
			Size of L2 Cache = 512KB
			Size of L1 Cache =  24KB
			Display resolution: 800 x 600
			// Memory Settings in System.ini
			...
			[386Enh]
			;
			; MaxPhysPage value	; Amount of physical RAM Windows 95 can access
			;
			MaxPhysPage=32768	; 823336 KB	= 804 MB ( Currently Used )
			...
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ OSs used for performance evaluations ]
		Windows 95 Pan European 32-bit
		Windows 2000 Professional 32-bit SP4
		Windows XP Professional 32-bit SP3
		Windows 7 Professional 64-bit SP1
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ IDEs used for performance evaluations ]
		Visual Studio 98 Professional Edition
		Visual Studio 2005 Professional Edition
		Visual Studio 2008 Professional Edition
		Visual Studio 2008 Express Edition
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ C++ compilers used for performance evaluations ]
		Borland C++ compiler v5.5.1 32-bit
		MinGW C++ compiler v3.4.2 32-bit
		MinGW C++ compiler v4.8.1 32-bit
		MinGW C++ compiler v4.9.2 32-bit
		MinGW C++ compiler v4.9.2 64-bit
		MinGW C++ compiler v5.1.0 32-bit
		MinGW C++ compiler v5.1.0 64-bit
		MinGW C++ compiler v6.1.0 32-bit
		MinGW C++ compiler v6.1.0 64-bit
		Microsoft C++ compiler ( VS98 PE   ) 32-bit
		Microsoft C++ compiler ( VS2005 PE ) 32-bit
		Microsoft C++ compiler ( VS2008 PE ) 32-bit
		Microsoft C++ compiler ( VS2008 PE ) 64-bit
		Microsoft C++ compiler ( VS2008 EE ) 32-bit
		Intel C++ compiler v7.1.0 ( u029 ) 32-bit
		Intel C++ compiler v8.1.0 ( u038 ) 32-bit
		Intel C++ compiler v12.1.7 ( u371 ) 32-bit
		Intel C++ compiler v13.1.0 ( u149 ) 32-bit
		Intel C++ compiler v13.1.0 ( u149 ) 64-bit
		Watcom C++ compiler v1.9.0 32-bit
		Watcom C++ compiler v2.0.0 32-bit
		Watcom C++ compiler v2.0.0 64-bit
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Base Performance Evaluations with MKL SGEMM function - CPU AN 32-bit Windows 95 ]
It is Not completed because an MKL library installation for the platform is No longer available
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Base Performance Evaluations with MKL SGEMM function - CPU P2 32-bit Windows 2000 ]
It is Not completed because an MKL library installation for the platform is No longer available
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Base Performance Evaluations with MKL SGEMM function - CPU P4 32-bit Windows XP ]
		Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
		Tests: Start
		> Test1153 Start <
		Sub-Test 1.1 - Runtime Binding of MKL functions
		Dynamic Library mkl_rt.dll Loaded
		Initialization Done
		Sub-Test 3.2 - MKL Matrix Multiplication
		Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
		Allocating Memory for Matrices ( 16-byte alignment )
		Intializing Matrix Data - Started
		Intializing Matrix Data - Completed
		Cblas xGEMM
		Matrix Size           :  1024 x  1024
		Matrix Size Threshold : N/A
		Matrix Partitions     : N/A
		Degree of Recursion   : N/A
		Result Sets Reflection: N/A
		Calculating...
		Cblas SGEMM  - Pass 01 - Completed:     0.53100 secs
		Cblas SGEMM  - Pass 02 - Completed:     0.51600 secs
		Cblas SGEMM  - Pass 03 - Completed:     0.51600 secs
		Cblas SGEMM  - Pass 04 - Completed:     0.51600 secs
		Cblas SGEMM  - Pass 05 - Completed:     0.51500 secs
		Cblas SGEMM - Passed
		Deallocating Memory
		Dynamic Library mkl_rt.dll Unloaded
		> Test1153 End <
		Tests: Completed
		Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release
		Tests: Start
		> Test1153 Start <
		Sub-Test 1.1 - Runtime Binding of MKL functions
		Dynamic Library mkl_rt.dll Loaded
		Initialization Done
		Sub-Test 3.2 - MKL Matrix Multiplication
		Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
		Allocating Memory for Matrices ( 16-byte alignment )
		Intializing Matrix Data - Started
		Intializing Matrix Data - Completed
		Cblas xGEMM
		Matrix Size           :  1024 x  1024
		Matrix Size Threshold : N/A
		Matrix Partitions     : N/A
		Degree of Recursion   : N/A
		Result Sets Reflection: N/A
		Calculating...
		Cblas SGEMM  - Pass 01 - Completed:     0.51500 secs
		Cblas SGEMM  - Pass 02 - Completed:     0.51500 secs
		Cblas SGEMM  - Pass 03 - Completed:     0.51600 secs
		Cblas SGEMM  - Pass 04 - Completed:     0.51600 secs
		Cblas SGEMM  - Pass 05 - Completed:     0.51600 secs
		Cblas SGEMM - Passed
		Deallocating Memory
		Dynamic Library mkl_rt.dll Unloaded
		> Test1153 End <
		Tests: Completed
		Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
		Tests: Start
		> Test1153 Start <
		Sub-Test 1.1 - Runtime Binding of MKL functions
		Dynamic Library mkl_rt.dll Loaded
		Initialization Done
		Sub-Test 3.2 - MKL Matrix Multiplication
		Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
		Allocating Memory for Matrices ( 16-byte alignment )
		Intializing Matrix Data - Started
		Intializing Matrix Data - Completed
		Cblas xGEMM
		Matrix Size           :  1024 x  1024
		Matrix Size Threshold : N/A
		Matrix Partitions     : N/A
		Degree of Recursion   : N/A
		Result Sets Reflection: N/A
		Calculating...
		Cblas SGEMM  - Pass 01 - Completed:     0.53200 secs
		Cblas SGEMM  - Pass 02 - Completed:     0.51500 secs
		Cblas SGEMM  - Pass 03 - Completed:     0.51600 secs
		Cblas SGEMM  - Pass 04 - Completed:     0.51600 secs
		Cblas SGEMM  - Pass 05 - Completed:     0.51500 secs
		Cblas SGEMM - Passed
		Deallocating Memory
		Dynamic Library mkl_rt.dll Unloaded
		> Test1153 End <
		Tests: Completed
		Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release
		Tests: Start
		> Test1153 Start <
		Sub-Test 1.1 - Runtime Binding of MKL functions
		Dynamic Library mkl_rt.dll Loaded
		Initialization Done
		Sub-Test 3.2 - MKL Matrix Multiplication
		Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
		Allocating Memory for Matrices ( 16-byte alignment )
		Intializing Matrix Data - Started
		Intializing Matrix Data - Completed
		Cblas xGEMM
		Matrix Size           :  1024 x  1024
		Matrix Size Threshold : N/A
		Matrix Partitions     : N/A
		Degree of Recursion   : N/A
		Result Sets Reflection: N/A
		Calculating...
		Cblas SGEMM  - Pass 01 - Completed:     0.54700 secs
		Cblas SGEMM  - Pass 02 - Completed:     0.51500 secs
		Cblas SGEMM  - Pass 03 - Completed:     0.51600 secs
		Cblas SGEMM  - Pass 04 - Completed:     0.51500 secs
		Cblas SGEMM  - Pass 05 - Completed:     0.51500 secs
		Cblas SGEMM - Passed
		Deallocating Memory
		Dynamic Library mkl_rt.dll Unloaded
		> Test1153 End <
		Tests: Completed
		Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release
		Tests: Start
		> Test1153 Start <
		Sub-Test 1.1 - Runtime Binding of MKL functions
		Dynamic Library mkl_rt.dll Loaded
		Initialization Done
		Sub-Test 3.2 - MKL Matrix Multiplication
		Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
		Allocating Memory for Matrices ( 16-byte alignment )
		Intializing Matrix Data - Started
		Intializing Matrix Data - Completed
		Cblas xGEMM
		Matrix Size           :  1024 x  1024
		Matrix Size Threshold : N/A
		Matrix Partitions     : N/A
		Degree of Recursion   : N/A
		Result Sets Reflection: N/A
		Calculating...
		Cblas SGEMM  - Pass 01 - Completed:     0.54900 secs
		Cblas SGEMM  - Pass 02 - Completed:     0.51600 secs
		Cblas SGEMM  - Pass 03 - Completed:     0.51500 secs
		Cblas SGEMM  - Pass 04 - Completed:     0.51500 secs
		Cblas SGEMM  - Pass 05 - Completed:     0.51600 secs
		Cblas SGEMM - Passed
		Deallocating Memory
		Dynamic Library mkl_rt.dll Unloaded
		> Test1153 End <
		Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Base Performance Evaluations with MKL SGEMM function - CPU IB 64-bit Windows 7 ]
		Application - ScaLibTestApp - WIN64_MSC ( 64-bit ) - Release
		Tests: Start
		> Test1153 Start <
		Sub-Test 1.1 - Runtime Binding of MKL functions
		Dynamic Library mkl_rt.dll Loaded
		Initialization Done
		Sub-Test 3.2 - MKL Matrix Multiplication
		Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
		Allocating Memory for Matrices ( 16-byte alignment )
		Intializing Matrix Data - Started
		Intializing Matrix Data - Completed
		Cblas xGEMM
		Matrix Size           :  1024 x  1024
		Matrix Size Threshold : N/A
		Matrix Partitions     : N/A
		Degree of Recursion   : N/A
		Result Sets Reflection: N/A
		Calculating...
		Cblas SGEMM  - Pass 01 - Completed:     0.06100 secs
		Cblas SGEMM  - Pass 02 - Completed:     0.06600 secs
		Cblas SGEMM  - Pass 03 - Completed:     0.06600 secs
		Cblas SGEMM  - Pass 04 - Completed:     0.06600 secs
		Cblas SGEMM  - Pass 05 - Completed:     0.06500 secs
		Cblas SGEMM - Passed
		Deallocating Memory
		Dynamic Library mkl_rt.dll Unloaded
		> Test1153 End <
		Tests: Completed
		Application - BccTestApp - WIN32_BCC ( 32-bit ) - Release
		Tests: Start
		> Test1153 Start <
		Sub-Test 1.1 - Runtime Binding of MKL functions
		Dynamic Library mkl_rt.dll Loaded
		Initialization Done
		Sub-Test 3.2 - MKL Matrix Multiplication
		Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
		Allocating Memory for Matrices ( 16-byte alignment )
		Intializing Matrix Data - Started
		Intializing Matrix Data - Completed
		Cblas xGEMM
		Matrix Size           :  1024 x  1024
		Matrix Size Threshold : N/A
		Matrix Partitions     : N/A
		Degree of Recursion   : N/A
		Result Sets Reflection: N/A
		Calculating...
		Cblas SGEMM  - Pass 01 - Completed:     0.06500 secs
		Cblas SGEMM  - Pass 02 - Completed:     0.06500 secs
		Cblas SGEMM  - Pass 03 - Completed:     0.06600 secs
		Cblas SGEMM  - Pass 04 - Completed:     0.06600 secs
		Cblas SGEMM  - Pass 05 - Completed:     0.06600 secs
		Cblas SGEMM - Passed
		Deallocating Memory
		Dynamic Library mkl_rt.dll Unloaded
		> Test1153 End <
		Tests: Completed
		Application - IccTestApp - WIN64_ICC ( 64-bit ) - Release
		Tests: Start
		> Test1153 Start <
		Sub-Test 1.1 - Runtime Binding of MKL functions
		Dynamic Library mkl_rt.dll Loaded
		Initialization Done
		Sub-Test 3.2 - MKL Matrix Multiplication
		Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
		Allocating Memory for Matrices ( 16-byte alignment )
		Intializing Matrix Data - Started
		Intializing Matrix Data - Completed
		Cblas xGEMM
		Matrix Size           :  1024 x  1024
		Matrix Size Threshold : N/A
		Matrix Partitions     : N/A
		Degree of Recursion   : N/A
		Result Sets Reflection: N/A
		Calculating...
		Cblas SGEMM  - Pass 01 - Completed:     0.06200 secs
		Cblas SGEMM  - Pass 02 - Completed:     0.06500 secs
		Cblas SGEMM  - Pass 03 - Completed:     0.06600 secs
		Cblas SGEMM  - Pass 04 - Completed:     0.06600 secs
		Cblas SGEMM  - Pass 05 - Completed:     0.06500 secs
		Cblas SGEMM - Passed
		Deallocating Memory
		Dynamic Library mkl_rt.dll Unloaded
		> Test1153 End <
		Tests: Completed
		Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release
		Tests: Start
		> Test1153 Start <
		Sub-Test 1.1 - Runtime Binding of MKL functions
		Dynamic Library mkl_rt.dll Loaded
		Initialization Done
		Sub-Test 3.2 - MKL Matrix Multiplication
		Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
		Allocating Memory for Matrices ( 16-byte alignment )
		Intializing Matrix Data - Started
		Intializing Matrix Data - Completed
		Cblas xGEMM
		Matrix Size           :  1024 x  1024
		Matrix Size Threshold : N/A
		Matrix Partitions     : N/A
		Degree of Recursion   : N/A
		Result Sets Reflection: N/A
		Calculating...
		Cblas SGEMM  - Pass 01 - Completed:     0.06700 secs
		Cblas SGEMM  - Pass 02 - Completed:     0.06500 secs
		Cblas SGEMM  - Pass 03 - Completed:     0.06600 secs
		Cblas SGEMM  - Pass 04 - Completed:     0.06500 secs
		Cblas SGEMM  - Pass 05 - Completed:     0.06500 secs
		Cblas SGEMM - Passed
		Deallocating Memory
		Dynamic Library mkl_rt.dll Unloaded
		> Test1153 End <
		Tests: Completed
		Application - WccTestApp - WIN32_WCC ( 32-bit ) - Release
		Tests: Start
		> Test1153 Start <
		Sub-Test 1.1 - Runtime Binding of MKL functions
		Dynamic Library mkl_rt.dll Loaded
		Initialization Done
		Sub-Test 3.2 - MKL Matrix Multiplication
		Matrix Multiplication C[ 1024x1024 ] = A[ 1024x1024 ] * B[ 1024x1024 ]
		Allocating Memory for Matrices ( 16-byte alignment )
		Intializing Matrix Data - Started
		Intializing Matrix Data - Completed
		Cblas xGEMM
		Matrix Size           :  1024 x  1024
		Matrix Size Threshold : N/A
		Matrix Partitions     : N/A
		Degree of Recursion   : N/A
		Result Sets Reflection: N/A
		Calculating...
		Cblas SGEMM  - Pass 01 - Completed:     0.06900 secs
		Cblas SGEMM  - Pass 02 - Completed:     0.06600 secs
		Cblas SGEMM  - Pass 03 - Completed:     0.06500 secs
		Cblas SGEMM  - Pass 04 - Completed:     0.06500 secs
		Cblas SGEMM  - Pass 05 - Completed:     0.06600 secs
		Cblas SGEMM - Passed
		Deallocating Memory
		Dynamic Library mkl_rt.dll Unloaded
		> Test1153 End <
		Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU AN 32-bit Windows 95 ]
		Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
		Tests: Start
		> Test1099 Start <
		Matrix A, B and C Sizes       :  1024 x  1024
		Loop Processing Schema ( LPS ): IJK
		Loop Blocking Divider         : 1
		Sub-Test 1.1 - MxMultA1 - Classic 2D
			LBOT size: N/A
			Completed:   140.56801 secs
		Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
			LBOT size: 1024x1024 elements
			Completed:   136.45601 secs
		Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
			LBOT size: N/A
			Completed:   145.31301 secs
		Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
			LBOT size: 1024x1024 elements
			Completed:   142.82801 secs
		Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
			LBOT size: N/A
			Completed:     5.08100 secs
		Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
			LBOT size: 1024x1024 elements
			Completed:     5.31400 secs
		Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
			LBOT size: N/A
			Completed:     5.61700 secs
		Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
			LBOT size: 1024x1024 elements
			Completed:     5.94600 secs
		Sub-Test 5.1 - MxMultD1 - Classic 1D
			LBOT size: N/A
			Completed:   136.55101 secs
		Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
			LBOT size: 1024x1024 elements
			Completed:   136.57901 secs
		> Test1099 End <
		Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU AN 32-bit Windows 95 ]
		Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
		Tests: Start
		> Test1099 Start <
		Matrix A, B and C Sizes       :  1024 x  1024
		Loop Processing Schema ( LPS ): IKJ
		Loop Blocking Divider         : 1
		Sub-Test 1.1 - MxMultA1 - Classic 2D
			LBOT size: N/A
			Completed:     9.87500 secs
		Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
			LBOT size: 1024x1024 elements
			Completed:     9.44900 secs
		Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
			LBOT size: N/A
			Completed:     9.73700 secs
		Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
			LBOT size: 1024x1024 elements
			Completed:     9.75100 secs
		Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
			LBOT size: N/A
			Completed:   147.64801 secs
		Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
			LBOT size: 1024x1024 elements
			Completed:   147.68901 secs
		Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
			LBOT size: N/A
			Completed:   146.48101 secs
		Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
			LBOT size: 1024x1024 elements
			Completed:   154.74801 secs
		Sub-Test 5.1 - MxMultD1 - Classic 1D
			LBOT size: N/A
			Completed:     9.44800 secs
		Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
			LBOT size: 1024x1024 elements
			Completed:     9.46300 secs
		> Test1099 End <
		Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P2 32-bit Windows 2000 ]
		Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
		Tests: Start
		> Test1099 Start <
		Matrix A, B and C Sizes       :  1024 x  1024
		Loop Processing Schema ( LPS ): IJK
		Loop Blocking Divider         : 1
		Sub-Test 1.1 - MxMultA1 - Classic 2D
		        LBOT size: N/A
        		Completed:   253.86501 secs
		Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
        		LBOT size: 1024x1024 elements
		        Completed:   253.85501 secs
		Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
        		LBOT size: N/A
		        Completed:   256.85901 secs
		Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
        		LBOT size: 1024x1024 elements
        		Completed:   257.74001 secs
		Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
        		LBOT size: N/A
        		Completed:    48.61000 secs
		Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
        		LBOT size: 1024x1024 elements
        		Completed:    59.95600 secs
		Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
        		LBOT size: N/A
        		Completed:    72.07300 secs
		Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
        		LBOT size: 1024x1024 elements
        		Completed:    72.43400 secs
		Sub-Test 5.1 - MxMultD1 - Classic 1D
        		LBOT size: N/A
        		Completed:   258.42101 secs
		Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
        		LBOT size: 1024x1024 elements
        		Completed:   258.35201 secs
		> Test1099 End <
		Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Intel C++ compiler v7.1.0 ( u029 ) - Release - 32-bit ( LPS: IJK ) - CPU P2 32-bit Windows 2000 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
 Tests: Start
 > Test1099 Start <
 Matrix A, B and C Sizes : 1024 x 1024
 Loop Processing Schema ( LPS ): IJK
 Loop Blocking Divider : 1
 Sub-Test 1.1 - MxMultA1 - Classic 2D
 LBOT size: N/A
 Completed: 254.23501 secs
 Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
 LBOT size: 1024x1024 elements
 Completed: 281.93501 secs
 Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
 LBOT size: N/A
 Completed: 254.79601 secs
 Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
 LBOT size: 1024x1024 elements
 Completed: 255.33701 secs
 Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
 LBOT size: N/A
 Completed: 47.97900 secs
 Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
 LBOT size: 1024x1024 elements
 Completed: 60.25600 secs
 Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
 LBOT size: N/A
 Completed: 72.31400 secs
 Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
 LBOT size: 1024x1024 elements
 Completed: 72.74500 secs
 Sub-Test 5.1 - MxMultD1 - Classic 1D
 LBOT size: N/A
 Completed: 272.31201 secs
 Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
 LBOT size: 1024x1024 elements
 Completed: 273.65301 secs
 > Test1099 End <
 Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IKJ ) - CPU P2 32-bit Windows 2000 ]
		Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
		Tests: Start
		> Test1099 Start <
		Matrix A, B and C Sizes       :  1024 x  1024
		Loop Processing Schema ( LPS ): IKJ
		Loop Blocking Divider         : 1
		Sub-Test 1.1 - MxMultA1 - Classic 2D
        		LBOT size: N/A
        		Completed:    59.51500 secs
		Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
		        LBOT size: 1024x1024 elements
        		Completed:    59.54500 secs
		Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
		        LBOT size: N/A
        		Completed:    98.13100 secs
		Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
        		LBOT size: 1024x1024 elements
		        Completed:    98.14100 secs
		Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
        		LBOT size: N/A
		        Completed:   254.30601 secs
		Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
        		LBOT size: 1024x1024 elements
		        Completed:   254.62601 secs
		Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
        		LBOT size: N/A
		        Completed:   256.21801 secs
		Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
        		LBOT size: 1024x1024 elements
		        Completed:   255.96901 secs
		Sub-Test 5.1 - MxMultD1 - Classic 1D
        		LBOT size: N/A
		        Completed:    59.69600 secs
		Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
        		LBOT size: 1024x1024 elements
		        Completed:    59.68600 secs
		> Test1099 End <
		Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Intel C++ compiler v7.1.0 ( u029 ) - Release - 32-bit ( LPS: IKJ ) - CPU P2 32-bit Windows 2000 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
 Tests: Start
 > Test1099 Start <
 Matrix A, B and C Sizes : 1024 x 1024
 Loop Processing Schema ( LPS ): IKJ
 Loop Blocking Divider : 1
 Sub-Test 1.1 - MxMultA1 - Classic 2D
 LBOT size: N/A
 Completed: 60.21600 secs
 Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
 LBOT size: 1024x1024 elements
 Completed: 59.84600 secs
 Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
 LBOT size: N/A
 Completed: 72.53500 secs
 Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
 LBOT size: 1024x1024 elements
 Completed: 72.52500 secs
 Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
 LBOT size: N/A
 Completed: 254.90701 secs
 Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
 LBOT size: 1024x1024 elements
 Completed: 254.93701 secs
 Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
 LBOT size: N/A
 Completed: 256.24901 secs
 Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
 LBOT size: 1024x1024 elements
 Completed: 256.48901 secs
 Sub-Test 5.1 - MxMultD1 - Classic 1D
 LBOT size: N/A
 Completed: 59.45600 secs
 Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
 LBOT size: 1024x1024 elements
 Completed: 59.48500 secs
 > Test1099 End <
 Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Intel C++ compiler v8.1.0 ( u038 ) - Release - 32-bit ( LPS: IJK ) - CPU P2 32-bit Windows 2000 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
 Tests: Start
 > Test1099 Start <
 Matrix A, B and C Sizes : 1024 x 1024
 Loop Processing Schema ( LPS ): IJK
 Loop Blocking Divider : 1
 Sub-Test 1.1 - MxMultA1 - Classic 2D
 LBOT size: N/A
 Completed: 253.37400 secs
 Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
 LBOT size: 1024x1024 elements
 Completed: 253.12400 secs
 Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
 LBOT size: N/A
 Completed: 254.65600 secs
 Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
 LBOT size: 1024x1024 elements
 Completed: 255.29700 secs
 Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
 LBOT size: N/A
 Completed: 47.44800 secs
 Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
 LBOT size: 1024x1024 elements
 Completed: 48.89000 secs
 Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
 LBOT size: N/A
 Completed: 72.33400 secs
 Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
 LBOT size: 1024x1024 elements
 Completed: 72.35400 secs
 Sub-Test 5.1 - MxMultD1 - Classic 1D
 LBOT size: N/A
 Completed: 249.90900 secs
 Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
 LBOT size: 1024x1024 elements
 Completed: 249.90900 secs
 > Test1099 End <
 Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Intel C++ compiler v8.1.0 ( u038 ) - Release - 32-bit ( LPS: IKJ ) - CPU P2 32-bit Windows 2000 ]
		Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
		Tests: Start
		> Test1099 Start <
		Matrix A, B and C Sizes       :  1024 x  1024
		Loop Processing Schema ( LPS ): IKJ
		Loop Blocking Divider         : 1
		Sub-Test 1.1 - MxMultA1 - Classic 2D
		        LBOT size: N/A
		        Completed:    60.24600 secs
		Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
		        LBOT size: 1024x1024 elements
		        Completed:    59.78600 secs
		Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
		        LBOT size: N/A
		        Completed:    79.53400 secs
		Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
		        LBOT size: 1024x1024 elements
		        Completed:    79.54400 secs
		Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
		        LBOT size: N/A
		        Completed:   253.84500 secs
		Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
		        LBOT size: 1024x1024 elements
		        Completed:   254.01600 secs
		Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
		        LBOT size: N/A
		        Completed:   255.91800 secs
		Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
		        LBOT size: 1024x1024 elements
		        Completed:   255.87800 secs
		Sub-Test 5.1 - MxMultD1 - Classic 1D
		        LBOT size: N/A
		        Completed:    59.30500 secs
		Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
		        LBOT size: 1024x1024 elements
		        Completed:    59.29500 secs
		> Test1099 End <
		Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
			
				
					
					
						[ Microsoft C++ compiler ( VS98 PE ) - Release - 32-bit ( LPS: IJK ) - CPU P4 32-bit Windows XP ]
		Application - ScaLibTestApp - WIN32_MSC ( 32-bit ) - Release
		Tests: Start
		> Test1099 Start <
		Matrix A, B and C Sizes       :  1024 x  1024
		Loop Processing Schema ( LPS ): IJK
		Loop Blocking Divider         : 1
		Sub-Test 1.1 - MxMultA1 - Classic 2D
		        LBOT size: N/A
		        Completed:    97.57800 secs
		Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
		        LBOT size: 1024x1024 elements
		        Completed:    97.71800 secs
		Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
		        LBOT size: N/A
		        Completed:    97.85900 secs
		Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
		        LBOT size: 1024x1024 elements
		        Completed:    97.89000 secs
		Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
		        LBOT size: N/A
		        Completed:     3.18800 secs
		Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
		        LBOT size: 1024x1024 elements
		        Completed:     4.37500 secs
		Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
		        LBOT size: N/A
		        Completed:     5.45300 secs
		Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
		        LBOT size: 1024x1024 elements
		        Completed:     5.76600 secs
		Sub-Test 5.1 - MxMultD1 - Classic 1D
		        LBOT size: N/A
		        Completed:    97.70400 secs
		Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
		        LBOT size: 1024x1024 elements
		        Completed:    97.71800 secs
		> Test1099 End <
		Tests: Completed
					
				
			
			
				
			
			
			
			
			
			
			
		
		
		
	
	
	
 
					
				
				
			
		
					
					Reply
					
						
	
		
				
				
				
					
						
					
				
					
				
				
				
				
			
			Topic Options
			
				
					
	
			
		
	- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page