Community
cancel
Showing results for 
Search instead for 
Did you mean: 
SergeyKostrov
Valued Contributor II
38 Views

Performance Evaluation of Matrix Identity algorithms

*** Performance Evaluation of Matrix Identity algorithms *** [ Computer System used for performance evaluations ] ** Dell Precision Mobile M4700 ** Intel Core i7-3840QM ( 2.80 GHz ) Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846 32GB RAM 320GB HDD NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory ) Windows 7 Professional 64-bit SP1 Size of L3 Cache = 8MB ( shared between all cores for data & instructions ) Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions ) Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions ) Display resolution: 1366 x 768
0 Kudos
13 Replies
SergeyKostrov
Valued Contributor II
38 Views

Matrix Identity Algorithm ( 32-bit ): 1024 x 1024 [ Tests Set 1 ( 32-bit ) - Matrix Size: 1024 x 1024 ] [ Microsoft C++ compiler ] Matrix Size: 1024 x 1024 Processing... Identity - Pass 01 - Completed: 5.81250 ticks Identity - Pass 02 - Completed: 5.87500 ticks Identity - Pass 03 - Completed: 5.87500 ticks Identity - Pass 04 - Completed: 5.81250 ticks Identity - Pass 05 - Completed: 5.87500 ticks Identity - Passed [ Borland C++ compiler ] Matrix Size: 1024 x 1024 Processing... Identity - Pass 01 - Completed: 4.87500 ticks Identity - Pass 02 - Completed: 4.87500 ticks Identity - Pass 03 - Completed: 5.87500 ticks Identity - Pass 04 - Completed: 5.87500 ticks Identity - Pass 05 - Completed: 5.87500 ticks Identity - Passed [ Intel C++ compiler ] Matrix Size: 1024 x 1024 Processing... Identity - Pass 01 - Completed: 1.93750 ticks Identity - Pass 02 - Completed: 2.00000 ticks Identity - Pass 03 - Completed: 1.93750 ticks Identity - Pass 04 - Completed: 1.93750 ticks Identity - Pass 05 - Completed: 2.93750 ticks Identity - Passed [ MinGW C++ compiler ] Matrix Size: 1024 x 1024 Processing... Identity - Pass 01 - Completed: 5.87500 ticks Identity - Pass 02 - Completed: 5.87500 ticks Identity - Pass 03 - Completed: 6.81250 ticks Identity - Pass 04 - Completed: 5.87500 ticks Identity - Pass 05 - Completed: 5.81250 ticks Identity - Passed [ Watcom C++ compiler ] Matrix Size: 1024 x 1024 Processing... Identity - Pass 01 - Completed: 5.81250 ticks Identity - Pass 02 - Completed: 5.87500 ticks Identity - Pass 03 - Completed: 4.87500 ticks Identity - Pass 04 - Completed: 5.87500 ticks Identity - Pass 05 - Completed: 5.87500 ticks Identity - Passed Note: 1 sec = 1000 ticks
SergeyKostrov
Valued Contributor II
38 Views

Matrix Identity Algorithm - Code Analysis [ Microsoft C++ compiler ] ... template < class T > _RTINLINE RTvoid _MatrixIdentityProcessingCRv2A( T * _RTRESTRICT ptS, RTssize_t iRows, RTssize_t iCols, RTint iNumOfThreads ) { ... 0024355C xor eax, eax 0024355E rep stos dword ptr es:[edi] ... } ... [ Borland C++ compiler ] ... template < class T > _RTINLINE RTvoid _MatrixIdentityProcessingCRv2A( T * _RTRESTRICT ptS, RTssize_t iRows, RTssize_t iCols, RTint iNumOfThreads ) { ... 00403069 xor ecx, ecx 0040306B inc edx 0040306C mov dword ptr [eax], ecx 0040306E add eax, 4 00403071 cmp edi, edx 00403073 jg 00403069 ... } ... [ Intel C++ compiler ] ... template < class T > _RTINLINE RTvoid _MatrixIdentityProcessingCRv2A( T * _RTRESTRICT ptS, RTssize_t iRows, RTssize_t iCols, RTint iNumOfThreads ) { ... 00401096 pxor xmm0, xmm0 0040109A movntps xmmword ptr [ebx+ecx*4], xmm0 0040109E movntps xmmword ptr [ebx+ecx*4+10h], xmm0 004010A3 movntps xmmword ptr [ebx+ecx*4+20h], xmm0 004010A8 movntps xmmword ptr [ebx+ecx*4+30h], xmm0 004010AD movntps xmmword ptr [ebx+ecx*4+40h], xmm0 004010B2 movntps xmmword ptr [ebx+ecx*4+50h], xmm0 004010B7 movntps xmmword ptr [ebx+ecx*4+60h], xmm0 004010BC movntps xmmword ptr [ebx+ecx*4+70h], xmm0 004010C1 add ecx,20h 004010C4 cmp ecx,eax 004010C6 jb 0040109A ... } ... [ MinGW C++ compiler ] ... template < class T > _RTINLINE RTvoid _MatrixIdentityProcessingCRv2A( T * _RTRESTRICT ptS, RTssize_t iRows, RTssize_t iCols, RTint iNumOfThreads ) { ... 0040A5AE call _memset( 0042FD28h ) ... } ... [ Watcom C++ compiler ] ... template < class T > _RTINLINE RTvoid _MatrixIdentityProcessingCRv2A( T * _RTRESTRICT ptS, RTssize_t iRows, RTssize_t iCols, RTint iNumOfThreads ) { ... 0040C9C7 cmp eax, ebp 0040C9C9 jge 0040C9ED 0040C9CB mov edx, eax 0040C9CD mov dword ptr [ebx+edx*4], 0 0040C9D4 inc eax 0040C9D5 jmp 0040C9C7 ... } ...
SergeyKostrov
Valued Contributor II
38 Views

Matrix Identity Algorithm ( 32-bit ): 2048 x 2048 [ Tests Set 2 ( 32-bit ) - Matrix Size: 2048 x 2048 ] [ Microsoft C++ compiler ] Matrix Size: 2048 x 2048 Processing... Identity - Pass 01 - Completed: 32.25000 ticks Identity - Pass 02 - Completed: 32.18750 ticks Identity - Pass 03 - Completed: 32.25000 ticks Identity - Pass 04 - Completed: 32.18750 ticks Identity - Pass 05 - Completed: 33.25000 ticks Identity - Passed [ Borland C++ compiler ] Matrix Size: 2048 x 2048 Processing... Identity - Pass 01 - Completed: 31.25000 ticks Identity - Pass 02 - Completed: 32.12500 ticks Identity - Pass 03 - Completed: 33.18750 ticks Identity - Pass 04 - Completed: 32.25000 ticks Identity - Pass 05 - Completed: 32.25000 ticks Identity - Passed [ Intel C++ compiler ] Matrix Size: 2048 x 2048 Processing... Identity - Pass 01 - Completed: 12.68750 ticks Identity - Pass 02 - Completed: 11.75000 ticks Identity - Pass 03 - Completed: 12.68750 ticks Identity - Pass 04 - Completed: 12.68750 ticks Identity - Pass 05 - Completed: 12.68750 ticks Identity - Passed [ MinGW C++ compiler ] Matrix Size: 2048 x 2048 Processing... Identity - Pass 01 - Completed: 32.25000 ticks Identity - Pass 02 - Completed: 33.18750 ticks Identity - Pass 03 - Completed: 32.25000 ticks Identity - Pass 04 - Completed: 32.18750 ticks Identity - Pass 05 - Completed: 33.25000 ticks Identity - Passed [ Watcom C++ compiler ] Matrix Size: 2048 x 2048 Processing... Identity - Pass 01 - Completed: 31.25000 ticks Identity - Pass 02 - Completed: 31.25000 ticks Identity - Pass 03 - Completed: 30.31250 ticks Identity - Pass 04 - Completed: 31.25000 ticks Identity - Pass 05 - Completed: 31.25000 ticks Identity - Passed Note: 1 sec = 1000 ticks
SergeyKostrov
Valued Contributor II
38 Views

Matrix Identity Algorithm ( 32-bit ): 4096 x 4096 [ Tests Set 3 ( 32-bit ) - Matrix Size: 4096 x 4096 ] [ Microsoft C++ compiler ] Matrix Size: 4096 x 4096 Processing... Identity - Pass 01 - Completed: 128.93750 ticks Identity - Pass 02 - Completed: 127.93750 ticks Identity - Pass 03 - Completed: 127.93750 ticks Identity - Pass 04 - Completed: 126.93750 ticks Identity - Pass 05 - Completed: 128.93750 ticks Identity - Passed [ Borland C++ compiler ] Matrix Size: 4096 x 4096 Processing... Identity - Pass 01 - Completed: 128.93750 ticks Identity - Pass 02 - Completed: 126.93750 ticks Identity - Pass 03 - Completed: 125.00000 ticks Identity - Pass 04 - Completed: 125.00000 ticks Identity - Pass 05 - Completed: 124.00000 ticks Identity - Passed [ Intel C++ compiler ] Matrix Size: 4096 x 4096 Processing... Identity - Pass 01 - Completed: 48.81250 ticks Identity - Pass 02 - Completed: 47.87500 ticks Identity - Pass 03 - Completed: 48.81250 ticks Identity - Pass 04 - Completed: 48.81250 ticks Identity - Pass 05 - Completed: 48.87500 ticks Identity - Passed [ MinGW C++ compiler ] Matrix Size: 4096 x 4096 Processing... Identity - Pass 01 - Completed: 127.93750 ticks Identity - Pass 02 - Completed: 126.93750 ticks Identity - Pass 03 - Completed: 127.93750 ticks Identity - Pass 04 - Completed: 127.93750 ticks Identity - Pass 05 - Completed: 126.93750 ticks Identity - Passed [ Watcom C++ compiler ] Matrix Size: 4096 x 4096 Processing... Identity - Pass 01 - Completed: 124.00000 ticks Identity - Pass 02 - Completed: 124.00000 ticks Identity - Pass 03 - Completed: 124.06250 ticks Identity - Pass 04 - Completed: 124.00000 ticks Identity - Pass 05 - Completed: 125.00000 ticks Identity - Passed Note: 1 sec = 1000 ticks
SergeyKostrov
Valued Contributor II
38 Views

Matrix Identity Algorithm ( 32-bit ): 8192 x 8192 [ Tests Set 4 ( 32-bit ) - Matrix Size: 8192 x 8192 ] [ Microsoft C++ compiler ] Matrix Size: 8192 x 8192 Processing... Identity - Pass 01 - Completed: 362.81250 ticks Identity - Pass 02 - Completed: 363.87500 ticks Identity - Pass 03 - Completed: 362.81250 ticks Identity - Pass 04 - Completed: 362.87500 ticks Identity - Pass 05 - Completed: 362.81250 ticks Identity - Passed [ Borland C++ compiler ] Matrix Size: 8192 x 8192 Processing... Identity - Pass 01 - Completed: 361.37500 ticks Identity - Pass 02 - Completed: 361.31250 ticks Identity - Pass 03 - Completed: 361.31250 ticks Identity - Pass 04 - Completed: 361.31250 ticks Identity - Pass 05 - Completed: 360.37500 ticks Identity - Passed [ Intel C++ compiler ] Matrix Size: 8192 x 8192 Processing... Identity - Pass 01 - Completed: 134.75000 ticks Identity - Pass 02 - Completed: 133.81250 ticks Identity - Pass 03 - Completed: 134.75000 ticks Identity - Pass 04 - Completed: 133.81250 ticks Identity - Pass 05 - Completed: 133.75000 ticks Identity - Passed [ MinGW C++ compiler ] Matrix Size: 8192 x 8192 Processing... Identity - Pass 01 - Completed: 356.50000 ticks Identity - Pass 02 - Completed: 357.37500 ticks Identity - Pass 03 - Completed: 357.43750 ticks Identity - Pass 04 - Completed: 356.43750 ticks Identity - Pass 05 - Completed: 356.43750 ticks Identity - Passed [ Watcom C++ compiler ] Matrix Size: 8192 x 8192 Processing... Identity - Pass 01 - Completed: 345.68750 ticks Identity - Pass 02 - Completed: 346.68750 ticks Identity - Pass 03 - Completed: 346.68750 ticks Identity - Pass 04 - Completed: 345.68750 ticks Identity - Pass 05 - Completed: 346.68750 ticks Identity - Passed Note: 1 sec = 1000 ticks
SergeyKostrov
Valued Contributor II
38 Views

Matrix Identity Algorithm ( 64-bit ): 16384 x 16384 [ Tests Set 5 ( 64-bit ) - Matrix Size: 16384 x 16384 ] [ Microsoft C++ compiler ] Matrix Size: 16384 x 16384 Processing... Identity - Pass 01 - Completed: 62.00000 ticks Identity - Pass 02 - Completed: 62.00000 ticks Identity - Pass 03 - Completed: 47.00000 ticks Identity - Pass 04 - Completed: 63.00000 ticks Identity - Pass 05 - Completed: 46.00000 ticks Identity - Passed [ Intel C++ compiler ] Matrix Size: 16384 x 16384 Processing... Identity - Pass 01 - Completed: 63.00000 ticks Identity - Pass 02 - Completed: 47.00000 ticks Identity - Pass 03 - Completed: 62.00000 ticks Identity - Pass 04 - Completed: 47.00000 ticks Identity - Pass 05 - Completed: 62.00000 ticks Identity - Passed [ MinGW C++ compiler ] Matrix Size: 16384 x 16384 Processing... Identity - Pass 01 - Completed: 62.00000 ticks Identity - Pass 02 - Completed: 47.00000 ticks Identity - Pass 03 - Completed: 63.00000 ticks Identity - Pass 04 - Completed: 46.00000 ticks Identity - Pass 05 - Completed: 63.00000 ticks Identity - Passed Note: 1 sec = 1000 ticks
SergeyKostrov
Valued Contributor II
38 Views

Matrix Identity Algorithm ( 64-bit ): 32768 x 32768 [ Tests Set 6 ( 64-bit ) - Matrix Size: 32768 x 32768 ] [ Microsoft C++ compiler ] Matrix Size: 32768 x 32768 Processing... Identity - Pass 01 - Completed: 218.00000 ticks Identity - Pass 02 - Completed: 218.00000 ticks Identity - Pass 03 - Completed: 219.00000 ticks Identity - Pass 04 - Completed: 218.00000 ticks Identity - Pass 05 - Completed: 219.00000 ticks Identity - Passed [ Intel C++ compiler ] Matrix Size: 32768 x 32768 Processing... Identity - Pass 01 - Completed: 219.00000 ticks Identity - Pass 02 - Completed: 218.00000 ticks Identity - Pass 03 - Completed: 219.00000 ticks Identity - Pass 04 - Completed: 218.00000 ticks Identity - Pass 05 - Completed: 218.00000 ticks Identity - Passed [ MinGW C++ compiler ] Matrix Size: 32768 x 32768 Processing... Identity - Pass 01 - Completed: 218.00000 ticks Identity - Pass 02 - Completed: 219.00000 ticks Identity - Pass 03 - Completed: 218.00000 ticks Identity - Pass 04 - Completed: 219.00000 ticks Identity - Pass 05 - Completed: 218.00000 ticks Identity - Passed Note: 1 sec = 1000 ticks
SergeyKostrov
Valued Contributor II
38 Views

Matrix Identity Algorithm ( 64-bit ): 65536 x 65536 [ Tests Set 7 ( 64-bit ) - Matrix Size: 65536 x 65536 ] [ Microsoft C++ compiler ] Matrix Size: 65536 x 65536 Processing... Identity - Pass 01 - Completed: 873.00000 ticks Identity - Pass 02 - Completed: 890.00000 ticks Identity - Pass 03 - Completed: 873.00000 ticks Identity - Pass 04 - Completed: 874.00000 ticks Identity - Pass 05 - Completed: 874.00000 ticks Identity - Passed [ Intel C++ compiler ] Matrix Size: 65536 x 65536 Processing... Identity - Pass 01 - Completed: 874.00000 ticks Identity - Pass 02 - Completed: 874.00000 ticks Identity - Pass 03 - Completed: 873.00000 ticks Identity - Pass 04 - Completed: 874.00000 ticks Identity - Pass 05 - Completed: 873.00000 ticks Identity - Passed [ MinGW C++ compiler ] Matrix Size: 65536 x 65536 Processing... Identity - Pass 01 - Completed: 874.00000 ticks Identity - Pass 02 - Completed: 873.00000 ticks Identity - Pass 03 - Completed: 874.00000 ticks Identity - Pass 04 - Completed: 873.00000 ticks Identity - Pass 05 - Completed: 874.00000 ticks Identity - Passed Note: 1 sec = 1000 ticks
SergeyKostrov
Valued Contributor II
38 Views

Matrix Identity Algorithm ( 64-bit ): 81920 x 81920 [ Tests Set 8 ( 64-bit ) - Matrix Size: 81920 x 81920 ] [ Microsoft C++ compiler ] Not Tested [ Intel C++ compiler ] Not Tested [ MinGW C++ compiler ] Not Tested Note: 1 sec = 1000 ticks
SergeyKostrov
Valued Contributor II
38 Views

Matrix Identity Algorithm ( 64-bit ): 131072 x 131072 [ Tests Set 9 ( 64-bit ) - Matrix Size: 131072 x 131072 ] [ Microsoft C++ compiler ] Not Tested [ Intel C++ compiler ] Not Tested [ MinGW C++ compiler ] Not Tested Note: 1 sec = 1000 ticks
Bernard
Black Belt
38 Views

Hi Sergey,

 

Nice set of tests.

Why do you test Watcom Compiler if it it lacks support of SIMD SSE architecture extensions?

 

SergeyKostrov
Valued Contributor II
38 Views

>>...Why do you test Watcom Compiler if it it lacks support of SIMD SSE architecture extensions? There is nothing wrong with it because I always test all major C++ compilers ( there are 6 of them ) supported on the project I've been working on. Next, take a look at ... Matrix Identity Algorithm ( 32-bit ): 4096 x 4096 [ Tests Set 3 ( 32-bit ) - Matrix Size: 4096 x 4096 ] ... test cases and you will see that Watcom and Borland C++ compilers did a good job compared to Microsoft and MinGW C++ compilers, but Intel C++ compilers more than twice outperformed all of them. Another thing is that these tests clearly demonstrated that a very good quality of binary codes generation is Not enough to be competitive in modern times and this is the case with Watcom and Borland C++ compilers. PS: Turbo C++ compiler ( 16-bit ) is Not used because it plays a different role as an Overall Source Codes Verifier.
SergeyKostrov
Valued Contributor II
38 Views

>>Another thing is that these tests clearly demonstrated that a very good quality of binary codes generation is Not >>enough to be competitive in modern times and this is the case with Watcom and Borland C++ compilers... Here are a couple of notes: - Borland C++ compiler is No longer supported and will never support latest Intel ISAs ( Instruction Set Architectures ); - Watcom C++ compiler could support it, but unfortunately, I don't see any progress in that direction and it is Not clear for me what Open Watcom C++ compiler team is currently doing. I recently integrated version 2.0 and it is Not too much different from version 1.9 and SSE is still Not supported. Take a look at 3rd post of the thread Matrix Identity Algorithm - Code Analysis and you will see why Intel C++ compiler outperformed all the rest C++ compilers on 32-bit tests. This is because it uses Non-Temporal moves for a 1st stage of the algorithm and Unrolls processing to 1-to-32 Loop Unrolling Schema ( LPS ). You could also see that on 64-bit tests all C++ compilers, that is, Intel, Microsoft and MinGW, showed identical performance.
Reply