Software Archive
Read-only legacy content
17061 Discussions

Performance Evaluation of Classic Matrix Multiplication algorithms

SergeyKostrov
Valued Contributor II
4,165 Views
*** Performance Evaluation of Classic Matrix Multiplication algorithms *** [ Abstract ] This is one of the most detailed analysis of performance of Classic Matrix Multiplication algorithm on different Software and Hardware platforms.
0 Kudos
1 Solution
zalia64
New Contributor I
4,068 Views

You are. right.

I have missed the one-letter difference in the title.

For simple readers like me, fundamental one-letter differences must be spelled out explicitly.

 

View solution in original post

0 Kudos
146 Replies
SergeyKostrov
Valued Contributor II
481 Views
Important Notes: Because there are some development constraints: 1. All 32-bit versions of the MinGW C++ compiler are installed on a computer with Pentium 4 CPU and 32-bit Windows XP. All tests are done on it as well! 2. All 64-bit versions of the MinGW C++ compiler are installed on a computer with Ivy Bridge CPU and 64-bit Windows 7. All tests are done on it as well! 3. Every new version of MinGW C++ compiler is faster than a previos version. 4. As soon as tests for MinGW C++ compiler v6.1.0 are done I will post results.
0 Kudos
zalia64
New Contributor I
481 Views

Dear Sergey,

I repeat my readings of your posts, that shows P4 is much, much faster then a modern IB CPU: 

Your very first post: 04/08/2016, 10:26          MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ]

> Application - MgwTestApp - WIN32_MGW ( 32-bit ) - Release
> Tests: Start
> Test1099 Start <
> Matrix A, B and C Sizes : 1024 x 1024  
> Loop Processing Schema ( LPS ): IKJ
> Loop Blocking Divider : 1
> Sub-Test 1.1 - MxMultA1 - Classic 2D
> LBOT size: N/A
> Completed: 2.75000 secs
> Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
> LBOT size: 1024x1024 elements
> Completed: 2.79700 secs

Your second post: 04/08/2016      10:27               MinGW C++ compiler v5.1.0 - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ]

> Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release
> Tests: Start
> Test1099 Start <
> Matrix A, B and C Sizes : 1024 x 1024
> Loop Processing Schema ( LPS ): IJK
> Loop Blocking Divider : 1
> Sub-Test 1.1 - MxMultA1 - Classic 2D
> LBOT size: N/A
> Completed: 8.92300 secs
> Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
> LBOT size: 1024x1024 elements
> Completed: 8.98600 secs
 

The way I read it, the P4 calculated it in 2.75 seconds, while the i-7 calculated it in 8.92 seconds, using the same compiler and the same algorithm.

 

0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
>>... >>Your very first post: 04/08/2016, 10:26 MinGW C++ compiler v5.1.0 - Release - 32-bit ( LPS: IKJ ) - CPU P4 32-bit Windows XP ] >>... >>Your second post: 04/08/2016 10:27 MinGW C++ compiler v5.1.0 - Release - 64-bit ( LPS: IJK ) - CPU IB 64-bit Windows 7 ] >>.... >>The way I read it, the P4 calculated it in 2.75 seconds, while the i-7 calculated it in 8.92 seconds, using the same >>compiler and the same algorithm. You're are still trying to compare different versions of the algorithms, that is, with LPS IJK against LPS IKJ. A matrix multiplication algorithm with a Loop Processing Schema IJK is Not the same algorithm with a Loop Processing Schema IKJ. I showed it several times and you still think that these are the same algorithms. It means that even if an Algorithm Name is the same Loop Processing Schemas could be IJK or IKJ. This is a very important to understand when comparing numbers of these tests. Once again, an algorithm "A" for a C++ compiler version A.A.A 32-bit with LPS IJK must be compared against an algorithm "A" ( the same name! ) for a C++ compiler version A.A.A 64-bit with LPS IJK ( the same LPS! ). A consistency of performance results, and I've talked about it at the beginning of the thread, could be only in case of comparisons of algorithms with the same LPS. Intel Pentium 4 CPU is Not faster of Intel Core i-7 3rd Generation CPU ( Ivy Bridge ) when algorithms with the same LPS are compared. When you try to compare an algorithm with LPS IJK for a CPU A ( older ) against the same algorithm with LPS IKJ for a CPU B ( newer ) an inconsistency is created because, as I've told that already several times, processing is different ( there are more cache misses! ) and that is why there is a false impression that older CPU A is faster than newer CPU B. Also, did you see two posts ( numbers 10 and 11 ) at the beginning of the thread? Titles are as follows: [ Base Performance Evaluations with MKL SGEMM function - CPU P4 32-bit Windows XP ] and [ Base Performance Evaluations with MKL SGEMM function - CPU IB 64-bit Windows 7 ] The Ivy Bridge CPU is at least 8 times faster than Pentium 4 CPU.
0 Kudos
zalia64
New Contributor I
4,069 Views

You are. right.

I have missed the one-letter difference in the title.

For simple readers like me, fundamental one-letter differences must be spelled out explicitly.

 

0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
Here is an example of the most consistent Performance results using 32-bit Microsoft C++ compiler from Visual Studio 98 Professional Edition ( VS98 PE ): [ Microsoft C++ compiler ( VS98 PE ) - Release - Test App 32-bit - LPS IJK ] Important Note: Loop Processing Schema ( LPS ) is IJK (!) ......[ Test App 32-bit - CPU IB ( Ivy Bridge ) - Windows 7 64-bit ] ......Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 0.48400 secs ......[ Test App 32-bit - CPU P4 ( Pentium 4 ) - Windows XP 32-bit ] ......Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 3.18800 secs ......[ Test App 32-bit - CPU AN ( Atom N270 ) - Windows 95 32-bit ] ......Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 5.08100 secs ......[ Test App 32-bit - CPU P2 ( Pentium II ) - Windows 2000 32-bit ] ......Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed - LBOT size: N/A - Completed: 48.61000 secs Note: Performance results are ordered in ascending order by a Completed time value These Performance results the most consistent because: - Supported Instruction Set Architectures ( ISAs ) for Intel CPUs used in the performance evaluation are as follows: Pentium II - IA-32, MMX Pentium 4 - IA-32, MMX, SSE, SSE2 Atom N270 - IA-32, MMX, SSE, SSE2, SSE3 Core i7 ( Ivy Bridge ) - IA-32, MMX, SSE, SSE2, SSE3, SSE4.x, HPI, AES, AVX - Intersection of all these ISAs are IA-32 and MMX ISAs - 32-bit test application was built on a computer with Intel Pentium II CPU using 32-bit Microsoft C++ compiler ( VS98 PE ) with support of IA-32 and MMX ISAs only - 32-bit test application was executed on computers with different generations of Intel CPUs, that is Pentium II, Pentium 4, Atom N270 and Core i7 ( Ivy Bridge ) - All tests were Single threaded executed on a Single Core even if Atom N270 has 2 cores ( supports 2 hardware threads ) and Ivy Bridge has 4 cores ( supports 4 hardware threads ) That is why these are the most consistent Performance results. It is clearly shown that an application built for two base ISAs ( IA-32 and MMX ) is executed faster on a next generation of Intel CPU with one exception: - Pentium 4 is faster than Pentium II - Atom N270 is faster than Pentium II, but Not faster than Pentium 4 ( this is an exception because Pentium 4 is on a Desktop class computer and Atom N270 is on a Netbook class computer ) - Ivy Bridge is faster than Pentium 4, Atom N270 and Pentium II Notes: MMX - Multi Media Extensions SSE - Streaming SIMD Extensions SSE2 - Streaming SIMD Extensions 2 SSE3 - Streaming SIMD Extensions 3 SSE4 - Streaming SIMD Extensions 4 HPI - Horizontally Packed Intrinsics AES - Advanced Encryption Set AVX - Advanced Vector Extensions
0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
>>...For simple readers like me, fundamental one-letter differences must be spelled out explicitly. I really appreciate your time and efforts to understand how results need to be interpreted! I'll do an update in a post related to LPSs, at the beginning of the thread, with as much as possible details on how an analysis needs to be done.
0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
[ From Post #108 ] >>... >>The Ivy Bridge CPU is at least 8 times faster than Pentium 4 CPU. By the way, here is a statement from a blog post made by one of former Intel employees: ... https://software.intel.com/en-us/blogs/2013/avx-512-instructions ... The evolution to Intel AVX-512 contributes to our goal to grow peak FLOP/sec by 8X over 4 generations: 2X with AVX1.0 with the Sandy Bridge architecture over the prior SSE4.2, extended by Ivy Bridge architecture with 16-bit float and random number support, 2X with AVX2.0 and its fused multiply-add (FMA) in the Haswell architecture and then 2X more with Intel AVX-512. ... So, my results are consistent with the statement and accuracy is about +/- 18.75%, that is ~6.5x improvement when compared SSE vs AVX. Take into account that all binary codes were optimized by compilers. In case of a set of base tests .with MKL ( posts #10 and #11 ) there is ~8x improvement when compared SSE vs AVX ( manually optimized source codes! ).
0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
[ Performance Evaluation of Core Processing Inline Functions for ] [ ALGORITHM_MULTIPLYCLASSIC of ScaLib for BDP project ] [ Abstract ] Core Processing Inline Functions ( CPIF ) of an algorithm is a fundamental feature of the ScaLib for BDP project and performance evaluations for two Matrix Multiplication algorithms, ALGORITHM_MULTIPLYCLASSIC for Transposed and Non-Transposed versions, are completed. ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ...
0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
[ Borland C++ compiler v5.5.1 32-bit ] ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.59400 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.59400 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.59400 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.59400 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.57800 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 2.37500 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 2.36000 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 2.37500 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 2.35900 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 2.37500 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 2.42200 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 2.40600 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 2.40700 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 2.40600 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 2.40600 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.57800 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.57800 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.57800 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.57800 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.59400 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 2.35900 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 2.37500 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 2.35900 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 2.37500 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 2.36000 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 2.40600 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 2.40600 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 2.40600 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 2.40700 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 2.40600 secs
0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
[ Borland C++ compiler v5.5.1 32-bit ] ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 11.39000 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 11.10900 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 11.11000 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 11.09300 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 11.11000 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 10.98400 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 10.98400 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 10.96900 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 10.96900 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 10.98400 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 10.82800 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 10.84300 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 10.82800 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 10.82800 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 10.81300 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 11.11000 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 11.10900 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 11.09400 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 11.10900 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 11.11000 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 10.98400 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 10.98400 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 10.96900 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 10.96900 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 10.95300 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 10.79700 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 10.81300 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 10.81200 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 10.81300 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 10.81200 secs
0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
[ MinGW C++ compiler v5.1.0 32-bit ] ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.57800 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.57800 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.57800 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.57800 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.57800 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 2.40700 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 2.31200 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 2.28100 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 2.29700 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 2.28100 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 2.17200 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 2.17200 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 2.17200 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 2.17200 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 2.17200 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.57800 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.56200 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.57900 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.57800 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.57800 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 2.26500 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 2.28200 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 2.28100 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 2.26600 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 2.28100 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 2.20300 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 2.18700 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 2.18800 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 2.21900 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 2.20300 secs
0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
[ MinGW C++ compiler v5.1.0 32-bit ] ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 10.90600 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 10.90600 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 10.90700 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 10.90600 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 10.92200 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 10.82800 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 10.81200 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 10.82800 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 10.82900 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 10.82800 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 10.56200 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 10.54700 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 10.56300 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 10.56200 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 10.54700 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 10.92200 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 10.90600 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 10.92200 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 10.92200 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 10.92200 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 10.79700 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 10.79600 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 10.78200 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 10.79700 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 10.79600 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 10.61000 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 10.60900 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 10.64100 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 10.62500 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 10.60900 secs
0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
[ Microsoft C++ compiler ( VS2005 PE ) 32-bit ] ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.40700 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.39000 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.39100 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.40600 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.39100 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.56200 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.56300 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.54700 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.57800 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.56200 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.56300 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.57800 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.56200 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.57900 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.56200 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.40600 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.39100 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.40600 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.39100 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.39000 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.56300 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.56200 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.56300 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.54700 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.56200 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.57800 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.56300 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.57800 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.56300 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.56300 secs
0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
[ Microsoft C++ compiler ( VS2005 PE ) 32-bit ] ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 11.12500 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 11.12500 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 11.12500 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 11.12500 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 11.12500 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 11.03100 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 11.04700 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 11.04700 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 11.04700 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 11.03100 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 11.15600 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 11.15600 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 11.15700 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 11.14000 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 11.15600 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 11.12500 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 11.12500 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 11.12500 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 11.12500 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 11.11000 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 11.04700 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 11.06200 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 11.04700 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 11.04700 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 11.04700 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 11.15600 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 11.15600 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 11.14100 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 11.15600 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 11.15600 secs
0 Kudos
SergeyKostrov
Valued Contributor II
481 Views
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit ] ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.48400 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.48500 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.46800 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.46900 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.46900 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.31200 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.31300 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.32800 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.31200 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.31300 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.31200 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.31300 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.29700 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.31200 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.29700 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.46900 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.46800 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.46900 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.46900 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.46900 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.31200 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.31300 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.31200 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.31300 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.32800 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.29700 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.31200 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.31300 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.31200 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.29700 secs
0 Kudos
SergeyKostrov
Valued Contributor II
436 Views
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit ] ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 2.67200 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 2.67200 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 2.67200 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 2.67100 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 2.68800 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 12.12500 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 12.12500 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 12.12500 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 12.12500 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 12.10900 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 0.48400 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 0.48500 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 0.48400 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 0.50000 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 0.48400 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 2.65700 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 2.67200 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 2.65600 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 2.67200 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 2.65600 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 12.12500 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 12.12500 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 12.12500 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 12.10900 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 12.10900 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 0.48500 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 0.50000 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 0.48400 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 0.48400 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 0.48500 secs
0 Kudos
SergeyKostrov
Valued Contributor II
436 Views
[ Watcom C++ compiler v2.0.0 32-bit ] ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.43700 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.43800 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.43700 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.42200 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.42200 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.35900 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.36000 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.35900 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.35900 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.36000 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.35900 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.36000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.34300 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.36000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.35900 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.35900 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.34400 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.36000 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.35900 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.35900 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.34400 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.35900 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.36000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.34400 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.35900 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.42200 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.42200 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.43700 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.42200 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.42200 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.35900 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.34400 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.35900 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.36000 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.35900 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.36000 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.34300 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.36000 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.35900 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.35900 secs
0 Kudos
SergeyKostrov
Valued Contributor II
436 Views
[ Watcom C++ compiler v2.0.0 32-bit ] ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 11.07900 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 11.06200 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 11.07800 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 11.07800 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 11.06300 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 10.87500 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 10.87500 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 10.87500 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 10.87500 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 10.87500 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 10.87500 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 10.89000 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 10.87500 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 10.87500 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 10.89100 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 11.23400 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 11.21900 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 11.21900 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 11.21900 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 11.23400 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 11.21900 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 11.21900 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 11.25000 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 11.23400 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 11.21900 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 11.07800 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 11.06200 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 11.07800 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 11.06300 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 11.06200 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 10.87500 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 10.87500 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 10.87500 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 10.87500 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 10.87500 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 11.21900 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 11.23400 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 11.21900 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 11.21900 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 11.21800 secs
0 Kudos
SergeyKostrov
Valued Contributor II
436 Views
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.06200 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.06200 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.06300 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.06200 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.07800 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.07800 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.06200 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.04700 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.06200 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.06300 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.06200 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.07800 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.07800 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.07800 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.07800 secs
0 Kudos
SergeyKostrov
Valued Contributor II
436 Views
[ Microsoft C++ compiler ( VS2008 PE ) 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Not Transposed _MatrixMulProcessingCUnRv1A - Pass 01 - Completed: 0.25000 secs _MatrixMulProcessingCUnRv1A - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv1A - Pass 03 - Completed: 0.24900 secs _MatrixMulProcessingCUnRv1A - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv1A - Pass 05 - Completed: 0.25000 secs _MatrixMulProcessingCv1B - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv1B - Pass 02 - Completed: 0.24900 secs _MatrixMulProcessingCv1B - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv1B - Pass 04 - Completed: 0.25000 secs _MatrixMulProcessingCv1B - Pass 05 - Completed: 0.23400 secs _MatrixMulProcessingCv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCv1D - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv1D - Pass 05 - Completed: 0.23400 secs _MatrixMulProcessingCv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCUnRv2A - Pass 01 - Completed: 0.25000 secs _MatrixMulProcessingCUnRv2A - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv2A - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCUnRv2A - Pass 04 - Completed: 0.24900 secs _MatrixMulProcessingCUnRv2A - Pass 05 - Completed: 0.23400 secs _MatrixMulProcessingCv2B - Pass 01 - Completed: 0.25000 secs _MatrixMulProcessingCv2B - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCv2B - Pass 03 - Completed: 0.24900 secs _MatrixMulProcessingCv2B - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv2B - Pass 05 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 01 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 02 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 03 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 04 - Completed: 0.23400 secs _MatrixMulProcessingCv2C - Pass 05 - Completed: 0.23400 secs
0 Kudos
SergeyKostrov
Valued Contributor II
436 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ] ... Data Set Size : 262144 elements ( 512 x 512 ) Number of Tests : 5 Number of Threads : 1 ... ALGORITHM_MULTIPLYCLASSIC - Transposed _MatrixMulProcessingCTUnRv1A - Pass 01 - Completed: 0.09300 secs _MatrixMulProcessingCTUnRv1A - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTUnRv1A - Pass 03 - Completed: 0.09400 secs _MatrixMulProcessingCTUnRv1A - Pass 04 - Completed: 0.07800 secs _MatrixMulProcessingCTUnRv1A - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCTv1B - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv1B - Pass 02 - Completed: 0.01500 secs _MatrixMulProcessingCTv1B - Pass 03 - Completed: 0.01600 secs _MatrixMulProcessingCTv1B - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCTv1B - Pass 05 - Completed: 0.01600 secs _MatrixMulProcessingCTv1C - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1C - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTv1D - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv1D - Pass 02 - Completed: 0.01600 secs _MatrixMulProcessingCTv1D - Pass 03 - Completed: 0.01500 secs _MatrixMulProcessingCTv1D - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCTv1D - Pass 05 - Completed: 0.01600 secs _MatrixMulProcessingCTv1E - Pass 01 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 02 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 03 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 04 - Completed: 0.00000 secs _MatrixMulProcessingCTv1E - Pass 05 - Completed: 0.00000 secs _MatrixMulProcessingCTUnRv2A - Pass 01 - Completed: 0.09400 secs _MatrixMulProcessingCTUnRv2A - Pass 02 - Completed: 0.07800 secs _MatrixMulProcessingCTUnRv2A - Pass 03 - Completed: 0.09300 secs _MatrixMulProcessingCTUnRv2A - Pass 04 - Completed: 0.09400 secs _MatrixMulProcessingCTUnRv2A - Pass 05 - Completed: 0.07800 secs _MatrixMulProcessingCTv2B - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv2B - Pass 02 - Completed: 0.01600 secs _MatrixMulProcessingCTv2B - Pass 03 - Completed: 0.01500 secs _MatrixMulProcessingCTv2B - Pass 04 - Completed: 0.03100 secs _MatrixMulProcessingCTv2B - Pass 05 - Completed: 0.01600 secs _MatrixMulProcessingCTv2C - Pass 01 - Completed: 0.03100 secs _MatrixMulProcessingCTv2C - Pass 02 - Completed: 0.01600 secs _MatrixMulProcessingCTv2C - Pass 03 - Completed: 0.01500 secs _MatrixMulProcessingCTv2C - Pass 04 - Completed: 0.03200 secs _MatrixMulProcessingCTv2C - Pass 05 - Completed: 0.01500 secs
0 Kudos
Reply