Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1711 Discussions

The maximum value of the MEM_UOPS_RETIRED: ALL_LOADS event per second. Haswell Xeon E5 2697 v3

gadel_zakirov
Beginner
702 Views

There is a supercomputer of the Moscow State University "Lomonosov 2" with Intel Haswell Xeon E5-2697 v3 processors. I am trying to find the theoretically possible maximum and the practically achievable maximum for the MEM_UOPS_RETIRED: ALL_LOADS event counter per second. Using 1 core and all 14.

I get the practically achievable value using a simple synthetic test. However, it is not clear what the theoretical maximum value per second can be obtained for both 1 core and 14 cores?

Can you please tell me this theoretical maximum value or how can I calculate it myself?

I would really appreciate any answer. This will help us evaluate the performance of applications running on our supercomputer.

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
684 Views

The Haswell core has two load ports, so it is able to retire a maximum of 2 load uops per cycle.

The maximum single-core Turbo frequency is 3.60 GHz, so a single core can retire 7.2 billion load uops per second.

The maximum all-core Turbo frequency for "non-AVX" core is 3.10 GHz, so the 14 cores could reach a maximum of 86.8 billion load uops per second in aggregate.   Running 256-bit AVX arithmetic code limits the maximum all-core Turbo frequency to 2.90 GHz, or 81.2 billion load uops per second.   

The actual frequency of operation will depend on the number of active cores, the types of instructions being used, and the effectiveness of the cooling system, so it is common to report the uop retirement rate in uops per cycle, rather than uops per second, and also report the average number of active cores and their average frequencies to provide the rest of the context....

View solution in original post

0 Kudos
1 Reply
McCalpinJohn
Honored Contributor III
685 Views

The Haswell core has two load ports, so it is able to retire a maximum of 2 load uops per cycle.

The maximum single-core Turbo frequency is 3.60 GHz, so a single core can retire 7.2 billion load uops per second.

The maximum all-core Turbo frequency for "non-AVX" core is 3.10 GHz, so the 14 cores could reach a maximum of 86.8 billion load uops per second in aggregate.   Running 256-bit AVX arithmetic code limits the maximum all-core Turbo frequency to 2.90 GHz, or 81.2 billion load uops per second.   

The actual frequency of operation will depend on the number of active cores, the types of instructions being used, and the effectiveness of the cooling system, so it is common to report the uop retirement rate in uops per cycle, rather than uops per second, and also report the average number of active cores and their average frequencies to provide the rest of the context....

0 Kudos
Reply