Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

How to measure flops on v4

GHui
Novice
1,527 Views

I cannot find FP* events on v4 via 64-ia-32-architectures-software-developer-manual-325462.pdf. Is there any manuals to show that?

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
1,527 Views

The events are documented at https://download.01.org/perfmon/BDW/Broadwell_core_V16.json -- look for "FP_ARITH" and you will find the various sub-events of the new 0xC7 core performance counter event.

View solution in original post

0 Kudos
9 Replies
McCalpinJohn
Honored Contributor III
1,528 Views

The events are documented at https://download.01.org/perfmon/BDW/Broadwell_core_V16.json -- look for "FP_ARITH" and you will find the various sub-events of the new 0xC7 core performance counter event.

0 Kudos
GHui
Novice
1,527 Views

I've collect the follwing events, and run xhpl for test.

FP_ARITH_INST_RETIRED.SCALAR_DOUBLE
FP_ARITH_INST_RETIRED.SCALAR_SINGLE
FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE
FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE
FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE
FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE
FP_ARITH_INST_RETIRED.SCALAR
FP_ARITH_INST_RETIRED.PACKED
FP_ARITH_INST_RETIRED.SINGLE
FP_ARITH_INST_RETIRED.DOUBLE

And their diff values in a seconds are "0 0 0 0 0 31684 1340700232 0.0 0.0 0.0".

I confused that how to undestand the events, some are zero, the other not. 

And are the events had inclusion relation.

 

 

 

0 Kudos
GHui
Novice
1,527 Views

I have run the  mkl/benchmarks/linpack/runme_xeon64 program. 

The runme_xeon64 output the following message

  • Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
    1000   1000   4      0.052      12.9315  8.866796e-13 3.023805e-02   pass
    1000   1000   4      0.008      82.7219  8.866796e-13 3.023805e-02   pass
    1000   1000   4      0.007      93.5988  8.866796e-13 3.023805e-02   pass
    1000   1000   4      0.007      92.9639  8.866796e-13 3.023805e-02   pass
    2000   2000   4      0.033      164.2892 3.864797e-12 3.361900e-02   pass
    2000   2000   4      0.027      200.3969 3.864797e-12 3.361900e-02   pass
    5000   5008   4      0.167      499.0555 2.383066e-11 3.322993e-02   pass
    5000   5008   4      0.190      438.7789 2.155309e-11 3.005404e-02   pass
    10000  10000  4      0.974      685.0007 8.261911e-11 2.913233e-02   pass
    10000  10000  4      0.906      736.1333 8.531753e-11 3.008383e-02   pass
    15000  15000  4      2.516      894.5636 2.272723e-10 3.579576e-02   pass
    15000  15000  4      2.760      815.4055 2.019905e-10 3.181385e-02   pass
    18000  18008  4      4.663      834.0049 3.264814e-10 3.575372e-02   pass
    18000  18008  4      4.587      847.6924 3.264814e-10 3.575372e-02   pass
    20000  20016  4      5.986      891.1581 3.565633e-10 3.156367e-02   pass
    20000  20016  4      6.009      887.7311 3.565633e-10 3.156367e-02   pass
    22000  22008  4      7.569      938.0349 4.454127e-10 3.262473e-02   pass
    22000  22008  4      7.541      941.4906 4.454127e-10 3.262473e-02   pass
    25000  25000  4      10.524     989.9109 5.087659e-10 2.893169e-02   pass
    25000  25000  4      10.488     993.3168 5.087659e-10 2.893169e-02   pass
    26000  26000  4      11.710     1000.7430 5.944061e-10 3.125565e-02   pass
    26000  26000  4      11.758     996.6501 5.944061e-10 3.125565e-02   pass
    27000  27000  4      13.020     1007.9769 6.490156e-10 3.164930e-02   pass
    30000  30000  1      17.293     1040.9949 8.272351e-10 3.260969e-02   pass

But I colloect these events only 274.324GFlops. 

0 Kudos
McCalpinJohn
Honored Contributor III
1,527 Views

How are you collecting these counts?

These events count instructions, not operations, so the first six need to be scaled by the corresponding width if you want an operation count.   The documentation pointed to above clearly explains how many operations each increment corresponds to, and points out that for Multiply-Add operations the counter is incremented twice, so that operations are counted in the expected way (Multiply-Add = 2 operations).

The scaling should be:

  • FP_ARITH_INST_RETIRED.SCALAR_DOUBLE                     1
  • FP_ARITH_INST_RETIRED.SCALAR_SINGLE                       1
  • FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE           2
  • FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE             4
  • FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE           4
  • FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE             8

From inspection of the Umask values, the next two events are the sum of the single and double precision operations for each case.  For the PACKED case it is not possible to get an operation count, since the single packed instructions correspond to a different number of operations than the double packed instructions.

  • FP_ARITH_INST_RETIRED.SCALAR
  • FP_ARITH_INST_RETIRED.PACKED

From inspection of the Umask values, the next two operations are the sum of scalar, packed 128 bit, and packed 256 bit operations for each width.  It is not possible to get an operation count from any of these counters, since they combine instructions of different widths.

  • FP_ARITH_INST_RETIRED.SINGLE
  • FP_ARITH_INST_RETIRED.DOUBLE

For the xHPL code running on a Xeon E5 v4, almost all of the counts should be in the FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE category.   These should be multiplied by 4 to get the FP operation count.

0 Kudos
GHui
Novice
1,527 Views

I set event 0x6310C7 to evtsel 0x18A, and get it from pmc 0xc5. 

And I get zero counts.

0 Kudos
GHui
Novice
1,527 Views

Does v3 can also use these "FP_ARITH"  events for counting flops?

0 Kudos
McCalpinJohn
Honored Contributor III
1,527 Views

These events do not exist on Xeon E5 v3. 

The 0xC7 event is not documented on Xeon E5 v3, but a quick test shows that it is counting something, and it looks like it is probably counting the 0xC7 SIMD events defined for the Nehalem/Westmere platform.  These include arithmetic and non-arithmetic SIMD instructions, so they are not useful for counting FP operations.

0 Kudos
GHui
Novice
1,527 Views

How can I count FP operations on v3? 

What events that I can use to count FP operations?

0 Kudos
McCalpinJohn
Honored Contributor III
1,527 Views

There are no counters for floating-point operations on Xeon E5 v3.

The 0x10 and 0x11 events that counted floating point operations on Xeon E5 v1 and v2 suffered from a serious implementation bug that could lead to serious overcounting (I have measured up to 10x over-counts), so these were disabled on the Xeon E5 v3.    Unfortunately the replacement 0xC7 events were not included until Xeon E5 v4, leaving Xeon E5 v3 with nothing.

0 Kudos
Reply