Showing results for 
Search instead for 
Did you mean: 
Valued Contributor II

Performance Evaluation of Matrix Transpose algorithms

*** Performance Evaluation of Matrix Transpose algorithms *** [ Computer System used for performance evaluations ] ** Dell Precision Mobile M4700 ** Intel Core i7-3840QM ( 2.80 GHz ) Ivy Bridge / 4 cores / 8 logical CPUs / 32GB RAM 320GB HDD NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory ) Windows 7 Professional 64-bit SP1 Size of L3 Cache = 8MB ( shared between all cores for data & instructions ) Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions ) Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions ) Display resolution: 1366 x 768
0 Kudos
40 Replies

Sergey Kostrov wrote:

>>In nanoseconds I do measurements for very small and critical sections of codes using rdtsc instruction.

Actually in clock cycles and it is very easy to convert a value to nanoseconds.

Just a note for other readers, I think there is no way to measure CPU cycles on today's IA other than relying on the PMU. In fact, this is not the guilty of Intel Turbo Boost or so, but rather one possible definition carried forward from ages where CPU cycles and clock cycles were the same. In particular, the RDTSC instruction does not measure CPU cycles but rather measures clock cycles (as you mentioned!).

0 Kudos