How processor peak performance is calculated?

nawrocki · ‎04-29-2006

This is a simple question but in fact I haven't found any information about it which satisfies me. Jack Dongarra writes that this is the maximum performance defined but processor manufacturer which cannot be exceeded.

So, how peak performance is calculated, e.g P4 --> 2xclock. Is it just a number the execution units (FMUL and FADD) which can simultaneously process data in one cycle or some other factors should be also taken into account? MKL or GOTO implementation can achieve ~85% efficiency while Fujitsu BLAS goes even 93%. I was wondering if >100% is possible. Just kidding :smileywink:.

Many thanks in advance! Best wishes,
Maciej Nawrocki

Intel_C_Intel · ‎05-01-2006

Peak performance depends on the processor and the function and where the data is. For matrix multiplication on double precision (DGEMM) the peak performance is the number of FP operations per second. For Pentium 4 processors using SSE instructions, the peak performance is 2 times the clock as a double precision multiply or add can be done each clock. On the Itanium processor, the peak rate is 4 times the clock since on each clock up to two FMA operations can be done, with each FMA being a multiply-add.

Of course on the Pentium 4 processor in single precision rate is twice the double precision rate.

Operations such a FFTs are more problematic in that there is not a balance between mutliplies and adds. Generally the number of operations is taken as 5*N*log*(N), but that is just a normalized number which does not necessarily represent the number of FP operations.

Vector operations such as dot product may have a peak performance similar to that for dgemm, but unless the data is in cache, the limits will be defined by the memory bandwidth rather than by the FP capabilities of the processor.