Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
7234 Discussions

How processor peak performance is calculated?

nawrocki
Beginner
1,118 Views

This is a simple question but in fact I haven't found any information about it which satisfies me. Jack Dongarra writes that this is the maximum performance defined but processor manufacturer which cannot be exceeded.

So, how peak performance is calculated, e.g P4 --> 2xclock. Is it just a number the execution units (FMUL and FADD) which can simultaneously process data in one cycle or some other factors should be also taken into account? MKL or GOTO implementation can achieve ~85% efficiency while Fujitsu BLAS goes even 93%. I was wondering if >100% is possible. Just kidding :smileywink:.

Many thanks in advance! Best wishes,
Maciej Nawrocki

0 Kudos
1 Reply
Intel_C_Intel
Employee
1,118 Views

Peak performance depends on the processor and the function and where the data is. For matrix multiplication on double precision (DGEMM) the peak performance is the number of FP operations per second. For Pentium 4 processors using SSE instructions, the peak performance is 2 times the clock as a double precision multiply or add can be done each clock. On the Itanium processor, the peak rate is 4 times the clock since on each clock up to two FMA operations can be done, with each FMA being a multiply-add.

Of course on the Pentium 4 processor in single precision rate is twice the double precision rate.

Operations such a FFTs are more problematic in that there is not a balance between mutliplies and adds. Generally the number of operations is taken as 5*N*log*(N), but that is just a normalized number which does not necessarily represent the number of FP operations.

Vector operations such as dot product may have a peak performance similar to that for dgemm, but unless the data is in cache, the limits will be defined by the memory bandwidth rather than by the FP capabilities of the processor.

0 Kudos
Reply