Software Archive
Read-only legacy content
17061 Discussions

Performance Evaluation of Matrix Transpose algorithms

SergeyKostrov
Valued Contributor II
838 Views
*** Performance Evaluation of Matrix Transpose algorithms *** [ Computer System used for performance evaluations ] ** Dell Precision Mobile M4700 ** Intel Core i7-3840QM ( 2.80 GHz ) Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846 32GB RAM 320GB HDD NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory ) Windows 7 Professional 64-bit SP1 Size of L3 Cache = 8MB ( shared between all cores for data & instructions ) Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions ) Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions ) Display resolution: 1366 x 768
0 Kudos
40 Replies
SergeyKostrov
Valued Contributor II
335 Views
[ MinGW C++ compiler ] Matrix Size: 8192 x 8192 Processing... Transpose - Classic - Pass 01 - Completed: 11447.86621 ticks Transpose - Classic - Pass 02 - Completed: 11494.79980 ticks Transpose - Classic - Pass 03 - Completed: 11488.53320 ticks Transpose - Classic - Pass 04 - Completed: 11495.86621 ticks Transpose - Classic - Pass 05 - Completed: 11483.33301 ticks Transpose - Classic - Passed Matrix Size: 8192 x 8192 Processing... Transpose - Diagonal - Pass 01 - Completed: 8606.26680 ticks Transpose - Diagonal - Pass 02 - Completed: 8603.13379 ticks Transpose - Diagonal - Pass 03 - Completed: 8604.13379 ticks Transpose - Diagonal - Pass 04 - Completed: 8603.13379 ticks Transpose - Diagonal - Pass 05 - Completed: 8604.20020 ticks Transpose - Diagonal - Passed Matrix Size: 8192 x 8192 Processing... Transpose - Eklundh - Pass 01 - Completed: 6617.66550 ticks Transpose - Eklundh - Pass 02 - Completed: 6619.79980 ticks Transpose - Eklundh - Pass 03 - Completed: 6618.73340 ticks Transpose - Eklundh - Pass 04 - Completed: 6616.66450 ticks Transpose - Eklundh - Pass 05 - Completed: 6617.73340 ticks Transpose - Eklundh - Passed
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
[ Watcom C++ compiler ] Matrix Size: 8192 x 8192 Processing... Transpose - Classic - Pass 01 - Completed: 10617.73333 ticks Transpose - Classic - Pass 02 - Completed: 10604.13333 ticks Transpose - Classic - Pass 03 - Completed: 10603.13333 ticks Transpose - Classic - Pass 04 - Completed: 10615.60000 ticks Transpose - Classic - Pass 05 - Completed: 10602.13333 ticks Transpose - Classic - Passed Matrix Size: 8192 x 8192 Processing... Transpose - Diagonal - Pass 01 - Completed: 7301.06687 ticks Transpose - Diagonal - Pass 02 - Completed: 7298.93333 ticks Transpose - Diagonal - Pass 03 - Completed: 7298.93333 ticks Transpose - Diagonal - Pass 04 - Completed: 7300.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 7297.93333 ticks Transpose - Diagonal - Passed Matrix Size: 8192 x 8192 Processing... Transpose - Eklundh - Pass 01 - Completed: 8496.86657 ticks Transpose - Eklundh - Pass 02 - Completed: 8500.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 8494.80000 ticks Transpose - Eklundh - Pass 04 - Completed: 8502.06647 ticks Transpose - Eklundh - Pass 05 - Completed: 8495.80000 ticks Transpose - Eklundh - Passed Note: 1 sec = 1000 ticks
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
Matrix Transpose Algorithms ( 64-bit ): 16384 x 16384 [ Tests Set 5 ( 64-bit ) - Matrix Size: 16384 x 16384 ] [ Microsoft C++ compiler ] Matrix Size: 16384 x 16384 Processing... Transpose - Classic - Pass 01 - Completed: 3214.00000 ticks Transpose - Classic - Pass 02 - Completed: 3198.00000 ticks Transpose - Classic - Pass 03 - Completed: 3244.00000 ticks Transpose - Classic - Pass 04 - Completed: 3230.00000 ticks Transpose - Classic - Pass 05 - Completed: 3291.00000 ticks Transpose - Classic - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Diagonal - Pass 01 - Completed: 1622.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 1622.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 1622.00000 ticks Transpose - Diagonal - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Eklundh - Pass 01 - Completed: 2246.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 2231.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 2247.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 2230.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 2247.00000 ticks Transpose - Eklundh - Passed
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
[ Intel C++ compiler ] Matrix Size: 16384 x 16384 Processing... Transpose - Classic - Pass 01 - Completed: 3260.00000 ticks Transpose - Classic - Pass 02 - Completed: 3167.00000 ticks Transpose - Classic - Pass 03 - Completed: 3292.00000 ticks Transpose - Classic - Pass 04 - Completed: 3198.00000 ticks Transpose - Classic - Pass 05 - Completed: 3213.00000 ticks Transpose - Classic - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Diagonal - Pass 01 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 1622.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 1607.00000 ticks Transpose - Diagonal - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Eklundh - Pass 01 - Completed: 1794.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 1810.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 1794.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 1809.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 1794.00000 ticks Transpose - Eklundh - Passed
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
[ MinGW C++ compiler ] Matrix Size: 16384 x 16384 Processing... Transpose - Classic - Pass 01 - Completed: 3198.00000 ticks Transpose - Classic - Pass 02 - Completed: 3166.00000 ticks Transpose - Classic - Pass 03 - Completed: 3167.00000 ticks Transpose - Classic - Pass 04 - Completed: 3198.00000 ticks Transpose - Classic - Pass 05 - Completed: 3198.00000 ticks Transpose - Classic - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Diagonal - Pass 01 - Completed: 1607.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 1623.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 1622.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 1622.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 1607.00000 ticks Transpose - Diagonal - Passed Matrix Size: 16384 x 16384 Processing... Transpose - Eklundh - Pass 01 - Completed: 2262.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 2246.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 2247.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 2246.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 2247.00000 ticks Transpose - Eklundh - Passed Note: 1 sec = 1000 ticks
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
Matrix Transpose Algorithms ( 64-bit ): 32768 x 32768 [ Tests Set 6 ( 64-bit ) - Matrix Size: 32768 x 32768 ] [ Microsoft C++ compiler ] Matrix Size: 32768 x 32768 Processing... Transpose - Classic - Pass 01 - Completed: 14696.00000 ticks Transpose - Classic - Pass 02 - Completed: 14820.00000 ticks Transpose - Classic - Pass 03 - Completed: 14804.00000 ticks Transpose - Classic - Pass 04 - Completed: 16162.00000 ticks Transpose - Classic - Pass 05 - Completed: 15085.00000 ticks Transpose - Classic - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Diagonal - Pass 01 - Completed: 8846.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 8829.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 8845.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 8830.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 8830.00000 ticks Transpose - Diagonal - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Eklundh - Pass 01 - Completed: 9423.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 9407.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 9422.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 9423.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 9407.00000 ticks Transpose - Eklundh - Passed
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
[ Intel C++ compiler ] Matrix Size: 32768 x 32768 Processing... Transpose - Classic - Pass 01 - Completed: 14492.00000 ticks Transpose - Classic - Pass 02 - Completed: 14649.00000 ticks Transpose - Classic - Pass 03 - Completed: 14695.00000 ticks Transpose - Classic - Pass 04 - Completed: 14680.00000 ticks Transpose - Classic - Pass 05 - Completed: 14554.00000 ticks Transpose - Classic - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Diagonal - Pass 01 - Completed: 9095.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 9095.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 9094.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 9080.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 9095.00000 ticks Transpose - Diagonal - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Eklundh - Pass 01 - Completed: 7660.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 7659.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 7644.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 7660.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 7644.00000 ticks Transpose - Eklundh - Passed
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
[ MinGW C++ compiler ] Matrix Size: 32768 x 32768 Processing... Transpose - Classic - Pass 01 - Completed: 14617.00000 ticks Transpose - Classic - Pass 02 - Completed: 14633.00000 ticks Transpose - Classic - Pass 03 - Completed: 14524.00000 ticks Transpose - Classic - Pass 04 - Completed: 14586.00000 ticks Transpose - Classic - Pass 05 - Completed: 16131.00000 ticks Transpose - Classic - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Diagonal - Pass 01 - Completed: 9064.00000 ticks Transpose - Diagonal - Pass 02 - Completed: 9063.00000 ticks Transpose - Diagonal - Pass 03 - Completed: 9048.00000 ticks Transpose - Diagonal - Pass 04 - Completed: 9080.00000 ticks Transpose - Diagonal - Pass 05 - Completed: 9048.00000 ticks Transpose - Diagonal - Passed Matrix Size: 32768 x 32768 Processing... Transpose - Eklundh - Pass 01 - Completed: 9656.00000 ticks Transpose - Eklundh - Pass 02 - Completed: 9657.00000 ticks Transpose - Eklundh - Pass 03 - Completed: 9656.00000 ticks Transpose - Eklundh - Pass 04 - Completed: 9657.00000 ticks Transpose - Eklundh - Pass 05 - Completed: 9656.00000 ticks Transpose - Eklundh - Passed Note: 1 sec = 1000 ticks
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
Matrix Transpose Algorithms ( 64-bit ): 65536 x 65536 [ Tests Set 7 ( 64-bit ) - Matrix Size: 65536 x 65536 ] [ Microsoft C++ compiler ] Matrix Size: 65536 x 65536 Processing... Transpose - Classic - Pass 01 - Completed: 510.575 secs Transpose - Classic - Pass 02 - Completed: 715.704 secs Transpose - Classic - Pass 03 - Completed: 691.896 secs Transpose - Classic - Pass 04 - Completed: 736.356 secs Transpose - Classic - Pass 05 - Completed: 520.152 secs Transpose - Classic - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Diagonal - Pass 01 - Completed: 51.215 secs Transpose - Diagonal - Pass 02 - Completed: 51.214 secs Transpose - Diagonal - Pass 03 - Completed: 51.216 secs Transpose - Diagonal - Pass 04 - Completed: 51.215 secs Transpose - Diagonal - Pass 05 - Completed: 51.218 secs Transpose - Diagonal - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Eklundh - Pass 01 - Completed: 37.971 secs Transpose - Eklundh - Pass 02 - Completed: 37.970 secs Transpose - Eklundh - Pass 03 - Completed: 37.971 secs Transpose - Eklundh - Pass 04 - Completed: 37.971 secs Transpose - Eklundh - Pass 05 - Completed: 37.970 secs Transpose - Eklundh - Passed
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
[ Intel C++ compiler ] Matrix Size: 65536 x 65536 Processing... Transpose - Classic - Pass 01 - Completed: 471.810 secs Transpose - Classic - Pass 02 - Completed: 667.638 secs Transpose - Classic - Pass 03 - Completed: 945.147 secs Transpose - Classic - Pass 04 - Completed: 675.141 secs Transpose - Classic - Pass 05 - Completed: 974.523 secs Transpose - Classic - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Diagonal - Pass 01 - Completed: 51.964 secs Transpose - Diagonal - Pass 02 - Completed: 51.980 secs Transpose - Diagonal - Pass 03 - Completed: 51.980 secs Transpose - Diagonal - Pass 04 - Completed: 51.979 secs Transpose - Diagonal - Pass 05 - Completed: 51.980 secs Transpose - Diagonal - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Eklundh - Pass 01 - Completed: 31.419 secs Transpose - Eklundh - Pass 02 - Completed: 31.418 secs Transpose - Eklundh - Pass 03 - Completed: 31.403 secs Transpose - Eklundh - Pass 04 - Completed: 31.419 secs Transpose - Eklundh - Pass 05 - Completed: 31.403 secs Transpose - Eklundh - Passed
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
[ MinGW C++ compiler ] Matrix Size: 65536 x 65536 Processing... Transpose - Classic - Pass 01 - Completed: 478.147 secs Transpose - Classic - Pass 02 - Completed: 526.629 secs Transpose - Classic - Pass 03 - Completed: 698.661 secs Transpose - Classic - Pass 04 - Completed: 674.391 secs Transpose - Classic - Pass 05 - Completed: 598.162 secs Transpose - Classic - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Diagonal - Pass 01 - Completed: 51.621 secs Transpose - Diagonal - Pass 02 - Completed: 51.636 secs Transpose - Diagonal - Pass 03 - Completed: 51.620 secs Transpose - Diagonal - Pass 04 - Completed: 51.637 secs Transpose - Diagonal - Pass 05 - Completed: 51.620 secs Transpose - Diagonal - Passed Matrix Size: 65536 x 65536 Processing... Transpose - Eklundh - Pass 01 - Completed: 38.610 secs Transpose - Eklundh - Pass 02 - Completed: 38.595 secs Transpose - Eklundh - Pass 03 - Completed: 38.626 secs Transpose - Eklundh - Pass 04 - Completed: 38.641 secs Transpose - Eklundh - Pass 05 - Completed: 38.595 secs Transpose - Eklundh - Passed Note: 1 min = 60 secs
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
Matrix Transpose Algorithms ( 64-bit ): 81920 x 81920 [ Tests Set 8 ( 64-bit ) - Matrix Size: 81920 x 81920 ] [ Microsoft C++ compiler ] Matrix Size: 81920 x 81920 Processing... Transpose - Diagonal - Pass 01 - Completed: 66.590 secs Transpose - Diagonal - Pass 02 - Completed: 66.591 secs Transpose - Diagonal - Pass 03 - Completed: 66.590 secs Transpose - Diagonal - Pass 04 - Completed: 66.559 secs Transpose - Diagonal - Pass 05 - Completed: 66.591 secs Transpose - Diagonal - Passed - [ Intel C++ compiler ] Matrix Size: 81920 x 81920 Processing... Transpose - Diagonal - Pass 01 - Completed: 67.346 secs Transpose - Diagonal - Pass 02 - Completed: 67.345 secs Transpose - Diagonal - Pass 03 - Completed: 67.346 secs Transpose - Diagonal - Pass 04 - Completed: 67.346 secs Transpose - Diagonal - Pass 05 - Completed: 67.345 secs Transpose - Diagonal - Passed - [ MinGW C++ compiler ] Matrix Size: 81920 x 81920 Processing... Transpose - Diagonal - Pass 01 - Completed: 66.790 secs Transpose - Diagonal - Pass 02 - Completed: 66.706 secs Transpose - Diagonal - Pass 03 - Completed: 66.890 secs Transpose - Diagonal - Pass 04 - Completed: 66.707 secs Transpose - Diagonal - Pass 05 - Completed: 66.591 secs Transpose - Diagonal - Passed Note: 1 min = 60 secs
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
Matrix Transpose Algorithms ( 64-bit ): 131072 x 131072 [ Tests Set 9 ( 64-bit ) - Matrix Size: 131072 x 131072 ] [ Microsoft C++ compiler ] Not Tested [ Intel C++ compiler ] Not Tested [ MinGW C++ compiler ] Matrix Size: 131072 x 131072 Processing... Transpose - Diagonal - Pass 01 - Completed: 13504.476 secs Transpose - Diagonal - Pass 02 - Completed: 7946.800 secs Transpose - Diagonal - Pass 03 - Completed: 9254.744 secs Transpose - Diagonal - Pass 04 - Completed: 9980.881 secs Transpose - Diagonal - Pass 05 - Completed: 10140.392 secs Transpose - Diagonal - Passed Note: 1 min = 60 secs
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
If somebody is interested in performance analysis than a data mining needs to be done ( manually ).
0 Kudos
Hans_P_Intel
Employee
335 Views

Hello Sergey,

I looked a bit into your postings (since you nicely shared a report of your IDF'16 impressions). I found various performance evaluations and this one in particular. I have some questions:

  • Where to find the actual source code of the transpose algorithm(s), which you've evaluated?
  • I guess you've been running your mobile workstations on plug rather than battery (fixed freq.)?
  • I guess "ticks" are Nanoseconds?

Hans

0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
Hi Hans, Here are answers on 2nd and 3rd questions >> I guess you've been running your mobile workstations on plug rather than battery ( fixed freq. )? Yes, 99% of time the Dell Precision Mobile workstation M4700 is pluged in because some processing could take hours to complete. >> I guess "ticks" are Nanoseconds? A Win32 API function GetTickCount was used in these tests and 1 sec = 1000 ticks ( or milliseconds ). An example of a test-case looks like: ... CrtPrintf( RTU("\tALGORITHM_TRANSPOSE\n") ); // _MatrixTransposeProcessingCRv1A // ALGORITHM_TRANSPOSE._MatrixTransposeProcessingCRv1A for( uiNT = 0; uiNT < uiNumberOfTests; uiNT += 1 ) { uiTicksStart = SysGetTickCount(); _MatrixTransposeProcessingCRv1A( m_tdsFa->m_ptData2D, m_tdsFb->m_ptData2D, m_tdsFa->m_iRows, m_tdsFa->m_iCols, iNumOfThreads ); uiTicksEnd = SysGetTickCount(); CrtPrintf( RTU("\t\t_MatrixTransposeProcessingCRv1A - Pass %02ld - Completed: %11.5f secs\n"), ( RTint )( uiNT + 1 ), ( RTfloat )( uiTicksEnd - uiTicksStart ) / ( RTfloat )1000.0f ); } CrtPrintf( RTU("\n") ); ... For a test case with a matrix smaller than 32Kx32K measurements are in milliseconds, or in ticks. For a test case with a matrix greater than 64Kx64K measurements are in seconds. In nanoseconds I do measurements for very small and critical sections of codes using rdtsc instruction.
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
>>In nanoseconds I do measurements for very small and critical sections of codes using rdtsc instruction. Actually in clock cycles and it is very easy to convert a value to nanoseconds.
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
[ Extended Tracing and Timing Functionality - 1 ] For example, this is an output from another test: ... ...Completed in 150205956096.000 cc... ... A processing is completed in 150205956096.000 clock cycles on a Processing Unit with a 2829200000 Hz frequency ( ~2.83 GHz ). Then after conversions times in seconds, milliseconds, microseconds and nanoseconds are as follows: ...Completed in 53.091 secs or ...Completed in 53091.318 ms or ...Completed in 53091317.721 mu or ...Completed in 53091317720.911 ns I prefer measurements in seconds, milliceconds ( or ticks ) and clock cycles, and don't use measurements in microseconds and nanoseconds. Note: cc - clock cycles ms - milliseconds mu - microseconds ns - nanoseconds
0 Kudos
SergeyKostrov
Valued Contributor II
335 Views
[ Extended Tracing and Timing Functionality - 2 ( Real Processing example ) ] ... > CStrassenSet Algorithms < Strassen HBC Matrix Size : 32768 x 32768 Matrix Size Threshold : 16384 x 16384 Matrix Partitions : 8 Degree of Recursion : 1 Result Sets Reflection: Disabled Calculating... TStrassenHBCSet::TStrassenHBCSet( T ** ) sizeof( TStrassenHBCSet ) = 192 sizeof( TStrassenHBCResultSet ) = 2176 TStrassenHBCResultSet->m_Index= 0 ResultSet Index: 0 Base Matrix Size: 32768 x 32768 Initialized: 0 A[0] Matrix Size: 16384 x 16384 B[0] Matrix Size: 16384 x 16384 A[1] Matrix Size: 16384 x 16384 B[1] Matrix Size: 16384 x 16384 M[0] Matrix Size: 16384 x 16384 M[1] Matrix Size: 16384 x 16384 M[2] Matrix Size: 16384 x 16384 M[3] Matrix Size: 16384 x 16384 M[4] Matrix Size: 16384 x 16384 M[5] Matrix Size: 16384 x 16384 M[6] Matrix Size: 16384 x 16384 ResultSet Index: 1 Base Matrix Size: 16384 x 16384 Initialized: 1 DEBUG: m_bRsInitialized = 1 Matrix Size : 32768 x 32768 Partitioned Matrix Size : 16384 x 16384 Size of Row / Column : 65536 bytes LBOT Block Size : 16 x 16 LBOT Block Size Divider : 1 Processor Unit Frequency: 2829200000 Hz OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids. OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7} OMP: Info #156: KMP_AFFINITY: 8 available OS procs OMP: Info #157: KMP_AFFINITY: Uniform topology OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores) OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map: OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0} OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2} OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4} OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {6} Mul - Completed in 150205956096.000 cc Mul - Completed in 53.091 secs Mul - Completed in 53091.318 ms Mul - Completed in 53091317.721 mu Mul - Completed in 53091317720.911 ns Processor Frequency 2829200000 Hz Mul - Completed in 149724971008.000 cc Mul - Completed in 52.921 secs Mul - Completed in 52921.310 ms Mul - Completed in 52921310.267 mu Mul - Completed in 52921310267.213 ns Processor Frequency 2829200000 Hz Mul - Completed in 149928263680.000 cc Mul - Completed in 52.993 secs Mul - Completed in 52993.165 ms Mul - Completed in 52993165.446 mu Mul - Completed in 52993165446.062 ns Processor Frequency 2829200000 Hz Mul - Completed in 149985640448.000 cc Mul - Completed in 53.013 secs Mul - Completed in 53013.446 ms Mul - Completed in 53013445.655 mu Mul - Completed in 53013445655.309 ns Processor Frequency 2829200000 Hz Mul - Completed in 149933621248.000 cc Mul - Completed in 52.995 secs Mul - Completed in 52995.059 ms Mul - Completed in 52995059.115 mu Mul - Completed in 52995059114.944 ns Processor Frequency 2829200000 Hz Mul - Completed in 150398959616.000 cc Mul - Completed in 53.160 secs Mul - Completed in 53159.536 ms Mul - Completed in 53159536.129 mu Mul - Completed in 53159536128.941 ns Processor Frequency 2829200000 Hz Mul - Completed in 150136520704.000 cc Mul - Completed in 53.067 secs Mul - Completed in 53066.775 ms Mul - Completed in 53066775.309 mu Mul - Completed in 53066775308.921 ns TStrassenHBCSet::~TStrassenHBCSet Strassen HBC - Pass 01 - Completed: 381.45300 secs ...
0 Kudos
Hans_P_Intel
Employee
328 Views

Sergey Kostrov wrote:

>>In nanoseconds I do measurements for very small and critical sections of codes using rdtsc instruction.

Actually in clock cycles and it is very easy to convert a value to nanoseconds.

Just a note for other readers, I think there is no way to measure CPU cycles on today's IA other than relying on the PMU. In fact, this is not the guilty of Intel Turbo Boost or so, but rather one possible definition carried forward from ages where CPU cycles and clock cycles were the same. In particular, the RDTSC instruction does not measure CPU cycles but rather measures clock cycles (as you mentioned!).

0 Kudos
Reply