Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
1612 Discussions

## The maximum value of the MEM_UOPS_RETIRED: ALL_LOADS event per second. Haswell Xeon E5 2697 v3

Beginner
341 Views

There is a supercomputer of the Moscow State University "Lomonosov 2" with Intel Haswell Xeon E5-2697 v3 processors. I am trying to find the theoretically possible maximum and the practically achievable maximum for the MEM_UOPS_RETIRED: ALL_LOADS event counter per second. Using 1 core and all 14.

I get the practically achievable value using a simple synthetic test. However, it is not clear what the theoretical maximum value per second can be obtained for both 1 core and 14 cores?

Can you please tell me this theoretical maximum value or how can I calculate it myself?

I would really appreciate any answer. This will help us evaluate the performance of applications running on our supercomputer.

1 Solution
Black Belt
323 Views

The Haswell core has two load ports, so it is able to retire a maximum of 2 load uops per cycle.

The maximum single-core Turbo frequency is 3.60 GHz, so a single core can retire 7.2 billion load uops per second.

The maximum all-core Turbo frequency for "non-AVX" core is 3.10 GHz, so the 14 cores could reach a maximum of 86.8 billion load uops per second in aggregate.   Running 256-bit AVX arithmetic code limits the maximum all-core Turbo frequency to 2.90 GHz, or 81.2 billion load uops per second.

The actual frequency of operation will depend on the number of active cores, the types of instructions being used, and the effectiveness of the cooling system, so it is common to report the uop retirement rate in uops per cycle, rather than uops per second, and also report the average number of active cores and their average frequencies to provide the rest of the context....

1 Reply
Black Belt
324 Views

The Haswell core has two load ports, so it is able to retire a maximum of 2 load uops per cycle.

The maximum single-core Turbo frequency is 3.60 GHz, so a single core can retire 7.2 billion load uops per second.

The maximum all-core Turbo frequency for "non-AVX" core is 3.10 GHz, so the 14 cores could reach a maximum of 86.8 billion load uops per second in aggregate.   Running 256-bit AVX arithmetic code limits the maximum all-core Turbo frequency to 2.90 GHz, or 81.2 billion load uops per second.

The actual frequency of operation will depend on the number of active cores, the types of instructions being used, and the effectiveness of the cooling system, so it is common to report the uop retirement rate in uops per cycle, rather than uops per second, and also report the average number of active cores and their average frequencies to provide the rest of the context....