Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Measuring theoretical flops for icelake processors

psing51
New Contributor I
2,691 Views

Hi,
I have few servers each equipped with with dual icelake 8358 processors.
I would like to know that the following is correct method to measure theoretical Double Precision flops  (RMax) -

=  cores/socket * sockets * frequency * operations/cycle * elements/operation

= 32 * 2 * 2.6 * 2 * ( 512 register size / 64 bits DP )

= 32 * 2 * 2.6 * 2 * 8

= 2662.4

Also, will there be any difference apart from frequency and cores/socket variable values if i try to calculate FLOPS for icelake 6338 CPU Model or cascadelake 6230 ? 

0 Kudos
2 Replies
McCalpinJohn
Honored Contributor III
2,605 Views

The formula above is missing a factor of two.   There are two AVX512 FMA units.  Each "lane" of each AVX512 unit performs two FP64 operations per cycle (Multiply + Add).

  • 2 FMA units/core * 8 lanes/FMA unit * 2 FP ops/lane/cycle = 32 FP Ops/core/cycle

Then

  • 2 sockets * 32 cores/socket = 64 cores

FInally

  • 64 cores * 32 FP Ops/cycle = 2048 FP ops/cycle

The frequency on the Xeon Platinum 8358 will vary with the configuration (since this chip supports a mode that can split the cores into a "high priority" pool of 12 cores and a "low priority pool" of 20 cores").  In the more traditional configuration, the base AVX512 frequency is 1.9 GHz and the maximum all-core AVX512 Turbo frequency is 2.9 GHz.  The actual average frequency when running compute-intensive workloads (i.e., close to 2 AVX512 FMA instructions per cycle) will be somewhere in the range of 1.9 to 2.9 GHz, but will vary depending on the current leakage characteristics of the specific piece of silicon as well as the effectiveness of the cooling system.  We saw a 13% range in average frequency when running the HPL benchmark on an ensemble of 3472 Xeon Platinum 8160 processors (Skylake Xeon).

The "nominal" 2.6 GHz is within the range of 1.9 to 2.9 GHz, so it is a reasonable estimator for the maximum throughput, but it is not an upper bound.   Using 2.9 GHz will give a strong upper bound because the chip will not allow all cores to run AVX512 512-bit FP code at more than 2.9 GHz.   Using 1.9 GHz will give a lower bound -- if the frequency does not stay at 1.9 GHz or higher, there is a problem with the system (maybe processor, maybe cooling, maybe power supply, etc.) that you probably want to look into.  (We recently saw a set of Xeon Platinum 8380 processors running at under the base AVX512 frequency -- the power supplies were overheating and throttling the processors.  A simple adjustment to the fan speeds fixed the problem.)

 

0 Kudos
ferrao
Novice
1,786 Views

Hi John, do you mind sharing the AVX-512 table for Xeon Platinum 8358? I could not find the document. It's curious because for the 2nd Gen Xeon Scalable is relatively easy to find.

 

Thank you.

0 Kudos
Reply