Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Measuring theoretical flops for icelake

psing51
New Contributor I
918 Views

Hi,
I have few servers each equipped with with dual icelake 8358 processors.
I would like to know that the following is correct method to measure theoretical Double Precision flops  (RMax) -

=  cores/socket * sockets * frequency * ops/cycle * elements/ops * vector registers per core

= 32 * 2 * 2.6 * 2 * ( 512 register size / 64 bits DP ) * 2

= 32 * 2 * 2.6 * 2 * 8 * 2

= 2662.4 * 2
= 5324.8


Also, with there be any difference apart from frequency and cores/socket if i try to calculate FLOPS for 6338 CPU Model?

0 Kudos
1 Reply
McCalpinJohn
Honored Contributor III
906 Views

The biggest problem with computing "peak" performance for recent processors is knowing what value to use for the frequency.

 

The nominal frequency on the Xeon Platinum 8358 is 2.6 GHz.  When running AVX512 code (required to get 32 FLOPS/cycle/core), the base frequency is 1.9 GHz and the maximum all-core Turbo frequency is 2.9 GHz.  The actual frequency seen when running a "peak FLOPS" sort of benchmark will depend on the leakage current of the particular chip and the effectiveness of the cooling system.

 

32 FLOPs/cycle/core * 32 cores * 1.9 GHz * 2 sockets = 3891.2 GFLOPS

32 FLOPS/cycle/core * 32 cores * 2.9 GHz * 2 sockets = 5939.2 GFLOPS

 

Based on experience with SKX and CLX processors, I expect you would see average frequencies for the HPL benchmark in the range of 2.2 GHz to 2.6 GHz across an ensemble of systems with this configuration.  HPL performance will be in the neighborhood of 90% of peak based on the actual average frequency during the run.  

 

The Xeon Gold 6338 will use the same procedure.  With SKX and CLX you had to check to see whether the processor had 1 or 2 AVX512 FMA units.  It looks like for ICX all of the processors have 2, so 32 FLOPS/cycle/core should work for all models.

0 Kudos
Reply