Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Valued Contributor II
63 Views

Performance Evaluation of Matrix Transpose algorithms

*** Performance Evaluation of Matrix Transpose algorithms *** [ Computer System used for performance evaluations ] ** Dell Precision Mobile M4700 ** Intel Core i7-3840QM ( 2.80 GHz ) Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846 32GB RAM 320GB HDD NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory ) Windows 7 Professional 64-bit SP1 Size of L3 Cache = 8MB ( shared between all cores for data & instructions ) Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions ) Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions ) Display resolution: 1366 x 768
0 Kudos
40 Replies
Valued Contributor II
19 Views

[ Intel C++ compiler ] Matrix Size: 8192 x 8192 Processing... Transpose - Classic - Pass 01 - Completed: 11505.20020 ticks Transpose - Classic - Pass 02 - Completed: 11517.73340 ticks Transpose - Classic - Pass 03 - Completed: 11532.26670 ticks Transpose - Classic - Pass 04 - Completed: 11523.93359 ticks Transpose - Classic - Pass 05 - Completed: 11524.00000 ticks Transpose - Classic - Passed Matrix Size: 8192 x 8192 Processing... Transpose - Diagonal - Pass 01 - Completed: 8284.33301 ticks Transpose - Diagonal - Pass 02 - Completed: 8286.46680 ticks Transpose - Diagonal - Pass 03 - Completed: 8283.33301 ticks Transpose - Diagonal - Pass 04 - Completed: 8285.40039 ticks Transpose - Diagonal - Pass 05 - Completed: 8283.33301 ticks Transpose - Diagonal - Passed Matrix Size: 8192 x 8192 Processing... Transpose - Eklundh - Pass 01 - Completed: 6097.93311 ticks Transpose - Eklundh - Pass 02 - Completed: 6098.93311 ticks Transpose - Eklundh - Pass 03 - Completed: 6096.86670 ticks Transpose - Eklundh - Pass 04 - Completed: 6097.93311 ticks Transpose - Eklundh - Pass 05 - Completed: 6097.93311 ticks Transpose - Eklundh - Passed
0 Kudos
Valued Contributor II
19 Views

[ MinGW C++ compiler ] Matrix Size: 8192 x 8192 Processing... Transpose - Classic - Pass 01 - Completed: 11447.86621 ticks Transpose - Classic - Pass 02 - Completed: 11494.79980 ticks Transpose - Classic - Pass 03 - Completed: 11488.53320 ticks Transpose - Classic - Pass 04 - Completed: 11495.86621 ticks Transpose - Classic - Pass 05 - Completed: 11483.33301 ticks Transpose - Classic - Passed Matrix Size: 8192 x 8192 Processing... Transpose - Diagonal - Pass 01 - Completed: 8606.26680 ticks Transpose - Diagonal - Pass 02 - Completed: 8603.13379 ticks Transpose - Diagonal - Pass 03 - Completed: 8604.13379 ticks Transpose - Diagonal - Pass 04 - Completed: 8603.13379 ticks Transpose - Diagonal - Pass 05 - Completed: 8604.20020 ticks Transpose - Diagonal - Passed Matrix Size: 8192 x 8192 Processing... Transpose - Eklundh - Pass 01 - Completed: 6617.66550 ticks Transpose - Eklundh - Pass 02 - Completed: 6619.79980 ticks Transpose - Eklundh - Pass 03 - Completed: 6618.73340 ticks Transpose - Eklundh - Pass 04 - Completed: 6616.66450 ticks Transpose - Eklundh - Pass 05 - Completed: 6617.73340 ticks Transpose - Eklundh - Passed
0 Kudos
Valued Contributor II
19 Views

[ Watcom C++ compiler ] Matrix Size: 8192 x 8192 Processing... Transpose - Classic - Pass 01 - Completed: 10617.73333 ticks Transpose - Classic - Pass 02 - Completed: 10604.13333 ticks Transpose - Classic - Pass 03 - Completed: 10603.13333 ticks Transpose - Classic - Pass 04 - Completed: 10615.60000 ticks Transpose - Classic - Pass 05 - Completed: 10602.13333 ticks Transpose - Classic - Passed Matrix Size: 8192 x 8192 Processing... Transpose - Diagonal - Pass 01 - Completed: 7301.06687 ticks Transpose - Diagonal - Pass 02 - Completed: 7298.93333 ticks Transpose - Diagonal - Pass 03 - Completed: 7298.93333 ticks Transpose - Diagonal - Pass 04 - Completed: 7300.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 7297.93333 ticks Transpose - Diagonal - Passed Matrix Size: 8192 x 8192 Processing... Transpose - Eklundh - Pass 01 - Completed: 8496.86657 ticks Transpose - Eklundh - Pass 02 - Completed: 8500.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 8494.80000 ticks Transpose - Eklundh - Pass 04 - Completed: 8502.06647 ticks Transpose - Eklundh - Pass 05 - Completed: 8495.80000 ticks Transpose - Eklundh - Passed Note: 1 sec = 1000 ticks
0 Kudos
Valued Contributor II
19 Views

Matrix Transpose Algorithms ( 64-bit ): 16384 x 16384 [ Tests Set 5 ( 64-bit ) - Matrix Size: 16384 x 16384 ] [ Microsoft C++ compiler ] Matrix Size: 16384 x 16384 Processing... Transpose - Classic - Pass 01 - Completed: 3214.00000 ticks Transpose - Classic - Pass 02 - Completed: 3198.00000 ticks Transpose - Classic - Pass 03 - Completed: 3244.00000 ticks Transpose - Classic - Pass 04 - Completed: 3230.00000 ticks Transpose - Classic - Pass 05 - Completed: 3291.00000 ticks Transpose - Classic - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Diagonal - Pass 01 - Completed: 1622.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 1622.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 1622.00000 ticks Transpose - Diagonal - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Eklundh - Pass 01 - Completed: 2246.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 2231.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 2247.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 2230.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 2247.00000 ticks Transpose - Eklundh - Passed
0 Kudos
Valued Contributor II
19 Views

[ Intel C++ compiler ] Matrix Size: 16384 x 16384 Processing... Transpose - Classic - Pass 01 - Completed: 3260.00000 ticks Transpose - Classic - Pass 02 - Completed: 3167.00000 ticks Transpose - Classic - Pass 03 - Completed: 3292.00000 ticks Transpose - Classic - Pass 04 - Completed: 3198.00000 ticks Transpose - Classic - Pass 05 - Completed: 3213.00000 ticks Transpose - Classic - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Diagonal - Pass 01 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 1622.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 1607.00000 ticks Transpose - Diagonal - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Eklundh - Pass 01 - Completed: 1794.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 1810.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 1794.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 1809.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 1794.00000 ticks Transpose - Eklundh - Passed
0 Kudos
Valued Contributor II
19 Views

[ MinGW C++ compiler ] Matrix Size: 16384 x 16384 Processing... Transpose - Classic - Pass 01 - Completed: 3198.00000 ticks Transpose - Classic - Pass 02 - Completed: 3166.00000 ticks Transpose - Classic - Pass 03 - Completed: 3167.00000 ticks Transpose - Classic - Pass 04 - Completed: 3198.00000 ticks Transpose - Classic - Pass 05 - Completed: 3198.00000 ticks Transpose - Classic - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Diagonal - Pass 01 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 1623.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 1622.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 1622.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 1607.00000 ticks Transpose - Diagonal - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Eklundh - Pass 01 - Completed: 2262.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 2246.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 2247.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 2246.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 2247.00000 ticks Transpose - Eklundh - Passed Note: 1 sec = 1000 ticks
0 Kudos
Valued Contributor II
19 Views

Matrix Transpose Algorithms ( 64-bit ): 32768 x 32768 [ Tests Set 6 ( 64-bit ) - Matrix Size: 32768 x 32768 ] [ Microsoft C++ compiler ] Matrix Size: 32768 x 32768 Processing... Transpose - Classic - Pass 01 - Completed: 14696.00000 ticks Transpose - Classic - Pass 02 - Completed: 14820.00000 ticks Transpose - Classic - Pass 03 - Completed: 14804.00000 ticks Transpose - Classic - Pass 04 - Completed: 16162.00000 ticks Transpose - Classic - Pass 05 - Completed: 15085.00000 ticks Transpose - Classic - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Diagonal - Pass 01 - Completed: 8846.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 8829.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 8845.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 8830.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 8830.00000 ticks Transpose - Diagonal - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Eklundh - Pass 01 - Completed: 9423.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 9407.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 9422.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 9423.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 9407.00000 ticks Transpose - Eklundh - Passed
0 Kudos
Valued Contributor II
19 Views

[ Intel C++ compiler ] Matrix Size: 32768 x 32768 Processing... Transpose - Classic - Pass 01 - Completed: 14492.00000 ticks Transpose - Classic - Pass 02 - Completed: 14649.00000 ticks Transpose - Classic - Pass 03 - Completed: 14695.00000 ticks Transpose - Classic - Pass 04 - Completed: 14680.00000 ticks Transpose - Classic - Pass 05 - Completed: 14554.00000 ticks Transpose - Classic - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Diagonal - Pass 01 - Completed: 9095.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 9095.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 9094.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 9080.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 9095.00000 ticks Transpose - Diagonal - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Eklundh - Pass 01 - Completed: 7660.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 7659.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 7644.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 7660.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 7644.00000 ticks Transpose - Eklundh - Passed
0 Kudos
Valued Contributor II
19 Views

[ MinGW C++ compiler ] Matrix Size: 32768 x 32768 Processing... Transpose - Classic - Pass 01 - Completed: 14617.00000 ticks Transpose - Classic - Pass 02 - Completed: 14633.00000 ticks Transpose - Classic - Pass 03 - Completed: 14524.00000 ticks Transpose - Classic - Pass 04 - Completed: 14586.00000 ticks Transpose - Classic - Pass 05 - Completed: 16131.00000 ticks Transpose - Classic - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Diagonal - Pass 01 - Completed: 9064.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 9063.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 9048.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 9080.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 9048.00000 ticks Transpose - Diagonal - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Eklundh - Pass 01 - Completed: 9656.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 9657.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 9656.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 9657.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 9656.00000 ticks Transpose - Eklundh - Passed Note: 1 sec = 1000 ticks
0 Kudos
Valued Contributor II
19 Views

Matrix Transpose Algorithms ( 64-bit ): 65536 x 65536 [ Tests Set 7 ( 64-bit ) - Matrix Size: 65536 x 65536 ] [ Microsoft C++ compiler ] Matrix Size: 65536 x 65536 Processing... Transpose - Classic - Pass 01 - Completed: 510.575 secs Transpose - Classic - Pass 02 - Completed: 715.704 secs Transpose - Classic - Pass 03 - Completed: 691.896 secs Transpose - Classic - Pass 04 - Completed: 736.356 secs Transpose - Classic - Pass 05 - Completed: 520.152 secs Transpose - Classic - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Diagonal - Pass 01 - Completed: 51.215 secs Transpose - Diagonal - Pass 02 - Completed: 51.214 secs Transpose - Diagonal - Pass 03 - Completed: 51.216 secs Transpose - Diagonal - Pass 04 - Completed: 51.215 secs Transpose - Diagonal - Pass 05 - Completed: 51.218 secs Transpose - Diagonal - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Eklundh - Pass 01 - Completed: 37.971 secs Transpose - Eklundh - Pass 02 - Completed: 37.970 secs Transpose - Eklundh - Pass 03 - Completed: 37.971 secs Transpose - Eklundh - Pass 04 - Completed: 37.971 secs Transpose - Eklundh - Pass 05 - Completed: 37.970 secs Transpose - Eklundh - Passed
0 Kudos
Valued Contributor II
19 Views

[ Intel C++ compiler ] Matrix Size: 65536 x 65536 Processing... Transpose - Classic - Pass 01 - Completed: 471.810 secs Transpose - Classic - Pass 02 - Completed: 667.638 secs Transpose - Classic - Pass 03 - Completed: 945.147 secs Transpose - Classic - Pass 04 - Completed: 675.141 secs Transpose - Classic - Pass 05 - Completed: 974.523 secs Transpose - Classic - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Diagonal - Pass 01 - Completed: 51.964 secs Transpose - Diagonal - Pass 02 - Completed: 51.980 secs Transpose - Diagonal - Pass 03 - Completed: 51.980 secs Transpose - Diagonal - Pass 04 - Completed: 51.979 secs Transpose - Diagonal - Pass 05 - Completed: 51.980 secs Transpose - Diagonal - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Eklundh - Pass 01 - Completed: 31.419 secs Transpose - Eklundh - Pass 02 - Completed: 31.418 secs Transpose - Eklundh - Pass 03 - Completed: 31.403 secs Transpose - Eklundh - Pass 04 - Completed: 31.419 secs Transpose - Eklundh - Pass 05 - Completed: 31.403 secs Transpose - Eklundh - Passed
0 Kudos
Valued Contributor II
19 Views

[ MinGW C++ compiler ] Matrix Size: 65536 x 65536 Processing... Transpose - Classic - Pass 01 - Completed: 478.147 secs Transpose - Classic - Pass 02 - Completed: 526.629 secs Transpose - Classic - Pass 03 - Completed: 698.661 secs Transpose - Classic - Pass 04 - Completed: 674.391 secs Transpose - Classic - Pass 05 - Completed: 598.162 secs Transpose - Classic - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Diagonal - Pass 01 - Completed: 51.621 secs Transpose - Diagonal - Pass 02 - Completed: 51.636 secs Transpose - Diagonal - Pass 03 - Completed: 51.620 secs Transpose - Diagonal - Pass 04 - Completed: 51.637 secs Transpose - Diagonal - Pass 05 - Completed: 51.620 secs Transpose - Diagonal - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Eklundh - Pass 01 - Completed: 38.610 secs Transpose - Eklundh - Pass 02 - Completed: 38.595 secs Transpose - Eklundh - Pass 03 - Completed: 38.626 secs Transpose - Eklundh - Pass 04 - Completed: 38.641 secs Transpose - Eklundh - Pass 05 - Completed: 38.595 secs Transpose - Eklundh - Passed Note: 1 min = 60 secs
0 Kudos
Valued Contributor II
19 Views

Matrix Transpose Algorithms ( 64-bit ): 81920 x 81920 [ Tests Set 8 ( 64-bit ) - Matrix Size: 81920 x 81920 ] [ Microsoft C++ compiler ] Matrix Size: 81920 x 81920 Processing... Transpose - Diagonal - Pass 01 - Completed: 66.590 secs Transpose - Diagonal - Pass 02 - Completed: 66.591 secs Transpose - Diagonal - Pass 03 - Completed: 66.590 secs Transpose - Diagonal - Pass 04 - Completed: 66.559 secs Transpose - Diagonal - Pass 05 - Completed: 66.591 secs Transpose - Diagonal - Passed - [ Intel C++ compiler ] Matrix Size: 81920 x 81920 Processing... Transpose - Diagonal - Pass 01 - Completed: 67.346 secs Transpose - Diagonal - Pass 02 - Completed: 67.345 secs Transpose - Diagonal - Pass 03 - Completed: 67.346 secs Transpose - Diagonal - Pass 04 - Completed: 67.346 secs Transpose - Diagonal - Pass 05 - Completed: 67.345 secs Transpose - Diagonal - Passed - [ MinGW C++ compiler ] Matrix Size: 81920 x 81920 Processing... Transpose - Diagonal - Pass 01 - Completed: 66.790 secs Transpose - Diagonal - Pass 02 - Completed: 66.706 secs Transpose - Diagonal - Pass 03 - Completed: 66.890 secs Transpose - Diagonal - Pass 04 - Completed: 66.707 secs Transpose - Diagonal - Pass 05 - Completed: 66.591 secs Transpose - Diagonal - Passed Note: 1 min = 60 secs
0 Kudos
Valued Contributor II
19 Views

Matrix Transpose Algorithms ( 64-bit ): 131072 x 131072 [ Tests Set 9 ( 64-bit ) - Matrix Size: 131072 x 131072 ] [ Microsoft C++ compiler ] Not Tested [ Intel C++ compiler ] Not Tested [ MinGW C++ compiler ] Matrix Size: 131072 x 131072 Processing... Transpose - Diagonal - Pass 01 - Completed: 13504.476 secs Transpose - Diagonal - Pass 02 - Completed: 7946.800 secs Transpose - Diagonal - Pass 03 - Completed: 9254.744 secs Transpose - Diagonal - Pass 04 - Completed: 9980.881 secs Transpose - Diagonal - Pass 05 - Completed: 10140.392 secs Transpose - Diagonal - Passed Note: 1 min = 60 secs
0 Kudos
Valued Contributor II
19 Views

If somebody is interested in performance analysis than a data mining needs to be done ( manually ).
0 Kudos
Employee
19 Views

Hello Sergey,

I looked a bit into your postings (since you nicely shared a report of your IDF'16 impressions). I found various performance evaluations and this one in particular. I have some questions:

  • Where to find the actual source code of the transpose algorithm(s), which you've evaluated?
  • I guess you've been running your mobile workstations on plug rather than battery (fixed freq.)?
  • I guess "ticks" are Nanoseconds?

Hans

0 Kudos
Valued Contributor II
19 Views

Hi Hans, Here are answers on 2nd and 3rd questions >> I guess you've been running your mobile workstations on plug rather than battery ( fixed freq. )? Yes, 99% of time the Dell Precision Mobile workstation M4700 is pluged in because some processing could take hours to complete. >> I guess "ticks" are Nanoseconds? A Win32 API function GetTickCount was used in these tests and 1 sec = 1000 ticks ( or milliseconds ). An example of a test-case looks like: ... CrtPrintf( RTU("\tALGORITHM_TRANSPOSE\n") ); // _MatrixTransposeProcessingCRv1A // ALGORITHM_TRANSPOSE._MatrixTransposeProcessingCRv1A for( uiNT = 0; uiNT < uiNumberOfTests; uiNT += 1 ) { uiTicksStart = SysGetTickCount(); _MatrixTransposeProcessingCRv1A( m_tdsFa->m_ptData2D, m_tdsFb->m_ptData2D, m_tdsFa->m_iRows, m_tdsFa->m_iCols, iNumOfThreads ); uiTicksEnd = SysGetTickCount(); CrtPrintf( RTU("\t\t_MatrixTransposeProcessingCRv1A - Pass %02ld - Completed: %11.5f secs\n"), ( RTint )( uiNT + 1 ), ( RTfloat )( uiTicksEnd - uiTicksStart ) / ( RTfloat )1000.0f ); } CrtPrintf( RTU("\n") ); ... For a test case with a matrix smaller than 32Kx32K measurements are in milliseconds, or in ticks. For a test case with a matrix greater than 64Kx64K measurements are in seconds. In nanoseconds I do measurements for very small and critical sections of codes using rdtsc instruction.
0 Kudos
Valued Contributor II
19 Views

>>In nanoseconds I do measurements for very small and critical sections of codes using rdtsc instruction. Actually in clock cycles and it is very easy to convert a value to nanoseconds.
0 Kudos
Valued Contributor II
19 Views

[ Extended Tracing and Timing Functionality - 1 ] For example, this is an output from another test: ... ...Completed in 150205956096.000 cc... ... A processing is completed in 150205956096.000 clock cycles on a Processing Unit with a 2829200000 Hz frequency ( ~2.83 GHz ). Then after conversions times in seconds, milliseconds, microseconds and nanoseconds are as follows: ...Completed in 53.091 secs or ...Completed in 53091.318 ms or ...Completed in 53091317.721 mu or ...Completed in 53091317720.911 ns I prefer measurements in seconds, milliceconds ( or ticks ) and clock cycles, and don't use measurements in microseconds and nanoseconds. Note: cc - clock cycles ms - milliseconds mu - microseconds ns - nanoseconds
0 Kudos
Valued Contributor II
19 Views

[ Extended Tracing and Timing Functionality - 2 ( Real Processing example ) ] ... > CStrassenSet Algorithms < Strassen HBC Matrix Size : 32768 x 32768 Matrix Size Threshold : 16384 x 16384 Matrix Partitions : 8 Degree of Recursion : 1 Result Sets Reflection: Disabled Calculating... TStrassenHBCSet::TStrassenHBCSet( T ** ) sizeof( TStrassenHBCSet ) = 192 sizeof( TStrassenHBCResultSet ) = 2176 TStrassenHBCResultSet->m_Index= 0 ResultSet Index: 0 Base Matrix Size: 32768 x 32768 Initialized: 0 A[0] Matrix Size: 16384 x 16384 B[0] Matrix Size: 16384 x 16384 A[1] Matrix Size: 16384 x 16384 B[1] Matrix Size: 16384 x 16384 M[0] Matrix Size: 16384 x 16384 M[1] Matrix Size: 16384 x 16384 M[2] Matrix Size: 16384 x 16384 M[3] Matrix Size: 16384 x 16384 M[4] Matrix Size: 16384 x 16384 M[5] Matrix Size: 16384 x 16384 M[6] Matrix Size: 16384 x 16384 ResultSet Index: 1 Base Matrix Size: 16384 x 16384 Initialized: 1 DEBUG: m_bRsInitialized = 1 Matrix Size : 32768 x 32768 Partitioned Matrix Size : 16384 x 16384 Size of Row / Column : 65536 bytes LBOT Block Size : 16 x 16 LBOT Block Size Divider : 1 Processor Unit Frequency: 2829200000 Hz OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} OMP: Info #156: KMP_AFFINITY: 8 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {6} Mul - Completed in 150205956096.000 cc Mul - Completed in 53.091 secs Mul - Completed in 53091.318 ms Mul - Completed in 53091317.721 mu Mul - Completed in 53091317720.911 ns Processor Frequency 2829200000 Hz Mul - Completed in 149724971008.000 cc Mul - Completed in 52.921 secs Mul - Completed in 52921.310 ms Mul - Completed in 52921310.267 mu Mul - Completed in 52921310267.213 ns Processor Frequency 2829200000 Hz Mul - Completed in 149928263680.000 cc Mul - Completed in 52.993 secs Mul - Completed in 52993.165 ms Mul - Completed in 52993165.446 mu Mul - Completed in 52993165446.062 ns Processor Frequency 2829200000 Hz Mul - Completed in 149985640448.000 cc Mul - Completed in 53.013 secs Mul - Completed in 53013.446 ms Mul - Completed in 53013445.655 mu Mul - Completed in 53013445655.309 ns Processor Frequency 2829200000 Hz Mul - Completed in 149933621248.000 cc Mul - Completed in 52.995 secs Mul - Completed in 52995.059 ms Mul - Completed in 52995059.115 mu Mul - Completed in 52995059114.944 ns Processor Frequency 2829200000 Hz Mul - Completed in 150398959616.000 cc Mul - Completed in 53.160 secs Mul - Completed in 53159.536 ms Mul - Completed in 53159536.129 mu Mul - Completed in 53159536128.941 ns Processor Frequency 2829200000 Hz Mul - Completed in 150136520704.000 cc Mul - Completed in 53.067 secs Mul - Completed in 53066.775 ms Mul - Completed in 53066775.309 mu Mul - Completed in 53066775308.921 ns TStrassenHBCSet::~TStrassenHBCSet Strassen HBC - Pass 01 - Completed: 381.45300 secs ...
0 Kudos